Monday, 26 May 2014

Psychology's real replication problem: our Methods sections

Replication has been a big topic in psychology recently as a) we've suddenly realised we need more of it and b) because there have recently been several high profile replication efforts published (e.g the Many Labs effort; see Ed Yong's summary). Last week Simone Schnall, one of the authors whose work failed to replicate in that project, wrote a blog about her experiences getting replicated. She has issues with the replication (specifically that their data had a ceiling effect on the morality measures which would obscure any differences) but one particular comment caught my eye:
Of course replications are much needed and as a field we need to make sure that our findings are reliable. But we need to keep in mind that there are human beings involved, which is what Danny Kahneman’s commentary emphasizes. Authors of the original work should be allowed to participate in the process of having their work replicated. (emphasis mine)
This idea that there is somehow a requirement to involve authors in efforts to replicate their work. This is nonsense; once you have published some work then it is fair game for replication, failure to replicate, criticism, critique and discussion. In other words, we're all allowed to science the hell out of your work any time we like. We don't need either your permission or your involvement: the only thing we (should) need is your Methods section and if you don't like this, then stop publishing your results where we can find them.

Of course, getting the original authors involved can be very productive; you can chat experimental details, make sure you have covered everything, and generally be collegial about the whole thing instead of adversarial. But there is no obligation to do this, and I'm surprised that people think there is one.

The idea seems to spring from the Kahneman commentary Schnall links too. The commentary in question is called 'A New Etiquette for Replication' and in it Kahneman proposes rules for running replications that he would eventually like to see enforced by journals and reviewers. The rules specify that a replicator must contact the original author ahead of data collection with a detailed experimental plan. The original author has some limited period to get back to them with comments about things they should do differently, the replicator has to either go with this comments or explain why they didn't, and then submit all this correspondence along with their manuscript to show they did everything in good faith. 

Kahneman's reasoning goes like this:
In the myth of perfect science, the method section of a research report always includes enough detail to permit a direct replication. Unfortunately, this seemingly reasonable demand is rarely satisfied in psychology, because behavior is easily affected by seemingly irrelevant factors. For example, experimental instructions are commonly paraphrased in the methods section, although their wording and even the font in which they are printed are known to be significant.

It is immediately obvious that a would-be replicator must learn the details of what the author did. It is less obvious, but in my view no less important, that the original author should have detailed advance knowledge of what the replicator plans to do. The hypothesis that guides this proposal is that authors will generally be more sensitive than replicators to the possible effects of small discrepancies of procedure. Rules for replication should therefore ensure a serious effort to involve the author in planning the replicator’s research.
This blows my mind. Apparently, psychology methods sections routinely do not include specifications of things we know to be significant factors in shaping the behaviours we study and the solution is make sure you talk to the original authors because they might be the only ones who know how to produce their effect. Only the authors of papers know all the magical little things they did that might have had an impact, and so therefore people trying to replicate the study must work with the original authors in order to make sure they do these things too. 

Here's a crazy alternative solution: how about we psychologists all agree to write Methods sections that would pass a first year Research Methods course and include all relevant information required for replication?

I am all for being collegial and working together to find solutions to problems, and Kahneman is not wrong to identify the fact that the big replication efforts have come off as a bit adversarial. But to set this up as a rule, as a norm of behaviour, is ridiculous. If you can't stand the replication heat, get out of the empirical kitchen because publishing your work means you think it's ready for prime time, and if other people can't make it work based on your published methods then that's your problem and not theirs. Let's remember who's responsible for what, people.

34 comments:

  1. Fully agree, just two comments: first of all, many journals have a strict word limit, so as authors we have to be very selective in which infornation to include. Which details of the procedure should we report in the Methods Section, and what should we omit? Should we report all analyses that we ran? How many papers must we cite? Etc.
    Second, suppose we stumble upon a potentially interesting effect in an experiment, but the effect is so volatile that it disappears with every minor change in experimental set-up, then how robust is the effect in the first place?
    Cheers
    John

    ReplyDelete
    Replies
    1. John.... yeah... that's a BIG part of the problem. On the one hand, there is a legitimate sense in which we might still care. A chemistry synthesis that only works within a very narrow temperature range, is still very interesting, so long as the methods section include that temperature range. On the other hand, if the effect noted is really resulting from the temperature, and not all the other things the scientist reports, then we have a problem. So if you have a real psychological effect, that is only visible under carefully controlled conditions, then it might be "robust" enough to deserve reporting. On the other hand, if you can explain the effect completely based on quirks in the methods, that is a very bad thing. All these problems are clearly visible, for example, in the criticisms of the infant looking literature: http://fixingpsychology.blogspot.com/2012/05/what-is-wrong-with-infant-looking.html

      Delete
  2. There is usually room in the Supplemental Files for a complete, detailed Methods Section. This should be published with the original manuscript. We need to do it in molecular studies, why are psychological ones different?

    ReplyDelete
  3. If factors known to be "significant" are routinely left out of published articles, what's the point of publication anyway? If a published article is missing a crucial element that made the experiment work, then how is the article useful science for anyone (not just for replicators)?

    ReplyDelete
    Replies
    1. Indeed. There can of course be unexpected contributions from things you didn't think mattered, but these would get revealed over replications in various locations.

      Delete
  4. I think the other point is that authors of the replication study did do further analysis to rule out that their negative replication was not due to the celling effect but the Schnall rejected this and then didn't mention this and the extensive and very courteous correspondence (as far as I can see) in her blog posts. Yes in molecular studies often the supplemental results, methods are much longer than the paper in question and journals have no problems with putting these online.

    ReplyDelete
  5. On one level, I agree with your points, but I think you're far too quick to dismiss Kahneman as nonsense. We could use more humility in this debate altogether -- I think a lot of the vitriol comes from people's failure to really listen to the other side and give them benefit of the doubt -- as they usually deserve -- before dismissing them.

    Science has very strict ideals. Scientists are humans. We aspire to uphold these ideals, but we often come woefully short of them, even when we're not cutting corners or trying to gain some advantage at the expense of the truths we seek. As humans, scientists have real limitations and if we fail to take those into account then we may be making strong arguments about the ideals of science, but pragmatically we may be making little progress.

    When Kahneman talks about the myth of perfect science he's acknowledging that things don't always go according to ideals and that we should be mindful of that. Perhaps researcher's whose work is not replicated should not feel threatened -- some aren't, but should we really be surprised that some are? Sometimes formally, usually informally, their ability or integrity may be called into question, and their life prospects may be affected. If you're ready to run those people out of the field, then I think you'll find yourself pretty lonely.

    So an "etiquette" that doesn't demand, but suggests that the original authors are consulted seems like a reasonable way to move things in the right direction. Does science demand it? No, absolutely not. Can it help improve science right now? I think it probably can.

    What's more, if scientists write bad methods sections, then this approach has a good chance to change that too. It would certainly be embarrassing to have to send a 12-page addendum to researchers to tell them the additional steps needed to properly replicate your study.

    By making the field more receptive to replications, and increasing their number and scope, the underlying elements are likely to be improved as well.

    So what Kahneman says may not be something that is required for science to be done, but I certainly don't agree that it's nonsense.

    ReplyDelete
    Replies
    1. "By making the field more receptive to replications, [...]"

      This may be true, but I hope people who keep reiterating this argument realise that a field that is not receptive to replications utterly fails as the field of an empirical science.

      It's that simple.

      Really.

      Delete
    2. What's more, if scientists write bad methods sections, then this approach has a good chance to change that too. It would certainly be embarrassing to have to send a 12-page addendum to researchers to tell them the additional steps needed to properly replicate your study.
      This I agree with entirely :)

      Delete
    3. This comment has been removed by the author.

      Delete
    4. Consider the following scenarios:

      1) you published an erroneous original finding
      2) you published an erroneous finding that cast doubt on a valid study that you had claimed to replicate.


      Are you equally upset in both cases? I am not, and I hope you are not. Perhaps this is nonsense, but I believe that the risk of falsely damaging the reputation of colleagues should count for something.

      Delete
    5. Thanks for stopping by.

      You don't solve Type I and Type II errors with ettiquette; you mitigate the risks with statistical power and replications (plural). The latter requires an adequate Methods section to work, and if someone fails to replicate my work in a good faith attempt because my Methods missed something important then shame on me, not them.

      You're worried about people's reputations, and that's fair. My thought here is that a better solution to the problem is for psychology to stop getting pulled around in the wind of single results, and be more interested in developing theoretically motivated programmes of work that can put the effects in question on a sound footing. Failing to replicate an effect in a good faith effort when you actually did everything required shouldn't be any more instantly effective than the original paper; stuff takes time to shake out properly, as Matthew Rodger points out below.

      I am all for being collegial and working in ways that bring everyone along for the ride. My concern here was that your recommendation to fix known inadequacies in psychology methods sections was not 'we should write better methods sections'. That really astonished me, and that's really all I was saying in this post (that and the fact that once your work is out there it's fair game because that's the actual game).

      Delete
    6. So an "etiquette" that doesn't demand, but suggests that the original authors are consulted seems like a reasonable way to move things in the right direction. Does science demand it? No, absolutely not. Can it help improve science right now? I think it probably can.
      This seems fine to me, so long as we remember the actual responsibility that is demanded is on the original authors and their Methods.

      Delete
  6. [Reply to Fred Hasselman, having trouble with nesting comments]

    I don't think it's a question of pass or fail. It's a question of striving to improve and I think that's what most everyone is trying to do. The question is what's the best way to make the most progress. If we stagnate then, yes, you're right, the field will not be much of an empirical science.

    ReplyDelete
    Replies
    1. Striving to improve, yes, but it will have to be through objectivity and being much more explicit about predictions and methods, as discussed in this blog post.
      At the same time, we need LESS concern about human emotions involved.

      I just spent some time reading the email exchange and blogs of the replication authors of the cleanliness study... I'm sorry but to frame this replication effort as was done in blogs, tweets and blog comments, including the efforts by the special issue editors, is truly appalling. I am shocked.

      This framing of replication as bullying of the scientist who made a claim about how the world works, will harm the reputation of our field even more than the news of another failed replication in psychology.

      So in my opinion the goal to achieve is: Detach scientist from his or her scientific claims and turn them into the claim falsifiers they are supposed to be.

      Tenure should be awarded to those who have proven most of their own claims wrong. Guaranteed that the one still standing will be very interesting.

      Delete
    2. Thank you for this reply! I too was a bit shocked to hear all about "reputations" and "bullying" and I don't know what in the replication discussion. I thought for a moment this wasn't about science at all, but your comment still makes me have some faith in it all.

      Delete
  7. I hate to say "everything I know about replication techniques in social psychology, I learned from Diederik Stapel", but here goes anyway:

    In the autobiography/confession which he wrote after being exposed as a fraud, Stapel describes how, as a grad student, he would write to the authors of published articles and ask them to send him the materials needed to reproduce their work. (Apparently replication was in fashion back then, in the early-mid 90s!) This was pre-email, or at least pre-PDF, so he would typically receive a big envelope of documents and questionnaires and floppy disks by post. But he says there was often a little extra: a Post-It note, or a handwritten letter, saying things like:

    “Don’t do this test on a computer. We tried that and it doesn’t work. It only works if you use pencil-and-paper forms.”

    “This experiment only works if you prime with ‘friendly’ or ‘nice’. It doesn’t work with ‘cool’ or ‘pleasant’ or ‘fine’. I don’t know why.”

    “After they’ve read the newspaper article, give the participants something else to do for three minutes. No more, no less. Three minutes, otherwise it doesn’t work.”

    “This questionnaire only works if you administer it to groups of three to five people. No more than that.”

    Stapel writes: "I certainly hadn’t encountered these kinds of instructions and warnings in the articles and research reports that I’d been reading. This advice was informal, almost under-the-counter, but it seemed to be a necessary part of developing a successful experiment."

    Now of course, he doesn't have the world's greatest track record of honesty, but I don't see any reason for him to make this up. You can certainly see why remarks like those above wouldn't make it into the average published Methods section.

    PS: Full disclosure: I am the authorised translator of Stapel's book.

    ReplyDelete
    Replies
    1. "You can certainly see why remarks like those above wouldn't make it into the average published Methods section."

      Isn't that problematic? Especially when considering the conclusions drawn in the article itself, which I feel are often not very nuanced.

      Delete
    2. Sounds like a problem to me.

      Delete
    3. Absolutely, it's problematic, hugely so. But it seems to me that to include any of those sentences in the methods section submitted to a journal would result, at best, in a suggestion by the reviewers to remove them, and at worst, in rejection because the effect isn't general enough. (Ironically, there might be a really useful study waiting to be done, to work out *why* people need a break of exactly three minutes, or give different answers on the computerised version of the test. It could be that the crud on the inside of the dirty test tubes that we use contains the most interesting molecule in the lab.)

      As Simmons et al. noted in their "False-Positive Psychology" article: "The redacted version of the study we reported in this article fully adheres to currently acceptable reporting standards and is, not coincidentally, deceptively persuasive. The requirement-compliant version ... would be—appropriately—all but impossible to publish."

      Delete
    4. Nick... yeah... exactly! The function of a methods section in chemistry is to allow another lab to reproduce your results the day after the publication comes out, with no need to contact you. The function of a methods section in many areas of psychology, is to increase how convincing the overall sell of the paper is. To help that out, reviewers and editors encourage are happier when authors lie (with lies of omission being routine and often encouraged). That is a HUGE problem.

      Delete
  8. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. I agree that psychology (and behavioural/neuroscientific sciences generally) could do well to improve the reporting of methods, but I think it is also important to be completely clear why this is needed, as it seems to me a lot of this ‘replication debate’ is wandering off-topic.
      Yes, we need clear and complete methods to make replications possible, but we want replications to be possible so that we can better understand the way that the world/mind works.
      A given psychological experiment is a historical event – something happened. People were put in certain environments, given certain instructions, various things were varied or kept constant, and certain behaviours were recorded. We look at how the behaviour changed with the things that were varied, relative to the things that stayed constant, in the context of a particular theoretical system. From this we try to determine the casual network of environment, mind and behaviour that can explain what happened and predict whether it might happen again. This is psychological science in a nutshell.
      Unfortunately, due to many factors (experimenter influence, insufficiently sensitive measures, individual differences in participants’ intentions or abilities, etc…), sometimes this strategy will produce an effect and sometimes it won’t. This is why we need to replicate experiments – to increase our confidence that an observed effect (or non-effect) is the result of real causal processes and not merely a fluke. ***Note: this is also why a single replication of an experiment that does not produce the same effect carries exactly as much weight as the original experiment, assuming the methods are identical.***
      Where this becomes interesting in terms of methodology is when two labs conduct the same experiment and achieve different results. If this happens consistently, we might start to think that there is a very important methodological difference between these two sets of experiments that is driving the difference in results. That is exactly where the microscope now needs to be pointed! Whatever the difference in methodologies that causes these differences in results may be a very important causal factor in human psychology, rather than a source of endless recriminations. If the factor can be identified (and this is why I agree with the argument for clear and complete methods sections), then a new experimental design might be employed that aligns with this causal factor to produce stronger (and, dare I say, more replicable) effects. Moreover, it may also be better controlled for to re-test the original theory.
      A cumulative science ought to care very deeply about the details of why one experiment ‘worked’ and another did not. It ought to try to get to the bottom of it and not dismiss one lab as employing ‘good methods’ and another as employing ‘bad methods’. Undoubtedly, this process is made easier when methodologies are fully explained and available, but there is also the possibility that there were important factors involved that were unbeknownst to the original researcher as having any part to play (and hence were not reported) but which later prove to be causally important. If these are revealed by replication attempts, happy days! We now have a chance to learn something new about human psychology.
      I realise this is Scientific Logic 101, but I think it is helpful during these debates to revisit why anyone should want to conduct or replicate a psychology experiment in the first place.

      Delete
    2. Mathew... Yes, that thing you said!

      A failure to replicate should NEVER, by itself, lead to incrimination. The best scientists, working in the best possible ways, should occasionally publish results that fail to replicate, and that doesn't make them bad scientists.

      On the other hand, if there is a lab that consistently publishes things that do not replicate, or entire literature that fail to replicate, or methods sections in which variables known to be key were left out intentionally to avoid scrutiny... well, that's bad for the field.

      What ultimately matters is getting a handle on the phenomena in question.

      So there is a fine line, and it is important both not to villainize people just because something did not replicate, and to quickly ferret out situations in which results will fail to replicate.

      Delete
  9. I'm interested in people's thoughts on the idea of supplementing a published method section with a video-walkthrough of the procedure for a study, that could then be publicly posted (Brian Nosek and colleagues suggested this in one of their papers). Seems like it would solve a lot of problems when it comes to replication and would enhance the credibility of one's own work.

    ReplyDelete
    Replies
    1. It's not a bad idea. They would be easy to host permanently on YouTube and someone coming in could spot things no-one else thought was an issue.

      Delete
  10. Let me first state that I am not a psychologist and have no expertise in this field, but I find these debates about scientific standards very interesting.

    I fully agree with what you said, but I'd probably go even a bit further: Maybe it is actually harmful to collaborate with the authors of the paper you're trying to replicate. Because this may contaminate your work with all kinds of hidden effects, from trying to be more friendly to the work of people you cooperate with to the fact that they may give you (unknowingly to themselves) hidden instructions how to replicate their errors.

    That said, whatever you do it'd probably a good idea to document if and how you've collaborated with the authors you're replicating.

    ReplyDelete
  11. Andrew... I'm a bad person for not reading the rest of the comments before I post this (and I will go back through and read them later tonight), but...

    1) Of course the methods sections suck! And yeah, its a big problem. You should see the infant looking time lit. Recall "Reason to be suspicious #5" from http://fixingpsychology.blogspot.com/2012/05/what-is-wrong-with-infant-looking.html which has my favorite example that actually was mentioned in a footnote: There is an object permanence got significance if the disappearing and reappearing toy carrot had a face on it!

    2. These same proposals would be absurd in any other discipline. Imagine if chemists were told that they had to contact the original author before attempting a published synthesis! Uhg! I wrote a comment related to these points a few months ago that made it through to the review stage at American Psychologist (an impressive feat in itself), but was not accepted. Any interest in trying to help with a revision? The ability to quote Kahneman might be enough of a boost on its own.

    ReplyDelete
    Replies
    1. Sure; send me the draft. Based on the activity on this topic this week in my Twitter feed, blog and all over the internet generally this message is resonating with people. I'd love to get this (what I think is a fairly uncontroversial point) into the formal literature.

      Delete
  12. I agree.

    As I wrote elsewhere (http://wp.me/p315fp-tm), it should be possible to reproduce work without original authors' help. For that purpose, all journals should require authors to upload all data (not just send them on request), and full descriptions of the data and procedures etc. in an online supplement. Ideally, anyone should be able to reproduce the work independently without ever contacting the author.

    In addition, it is not clear to me why an original author should have the right to analyze the replication and review the manuscript before publication. Once the original paper (and data) is published and enters the scientific dialogue, it is not the ‘property’ of the original author, but belongs to all. In the past, authors have answered to replications of their work, which I call a replication chain – but after publication.

    ReplyDelete
    Replies
    1. Nicole, very cool blog!
      I think some of the problems in psychology would be benefited by all of your suggestions, but in others by only a subset of your suggestions. Typically in psychology, the idea of uploading data is totally separate issue, because replication means "get new data". That is, most of these disagreements don't have to do with conflicting understandings of a single data set, they have to do with new data sets not coming out the same way. Of course, there ARE lots of disagreements about how to best analyze a data sets, but those arguments are not usually phrased in terms of "replication".

      In contrast, I am now working in a lab with a bunch of economists, and their situation seems very similar to the ones you describe. Large data sets magically appear on the interwebs, and they run their models. If someone else can't get the same results from the data, that's a big problem, and they will want to see the exact data set and code the authors used. Again, this sometimes happens psychology, but is not nearly as common.

      I like to give authors the benefit of the doubt, and so if I failed to replicate someone's study, I would assume that their data and their analysis reflected the published report. My question would be whether one of us had hit that 1 in 20 fluke, or if there was something about the methods of data creation that caused the difference. Only if I had strong reason for further suspicion would I wonder if they had done the analysis improperly or fudged their data.

      Delete
    2. Once the original paper (and data) is published and enters the scientific dialogue, it is not the ‘property’ of the original author, but belongs to all.
      I agree, but Rolf Zwaan correctly points out that people in social psychology often consider their work to be theirs and requires their skills and knowledge to run properly.

      Delete
  13. Thanks for the post, I fully agree with you. Anyone who publishes their research study should be ready to have it scrutinized and replicated. They should have enough information in the methods section including all significant factors to produce replicability. Talking to the original author negates the rule of replicability because you are relying on the author to get the same results, not the experiment itself.

    ReplyDelete