Thursday, 15 November 2012

Psychological Science...meet me at camera 3

Psychological Science, I think we need to talk. I was reading this farewell from your outgoing editor, and it would all be nice enough if I hadn't also just read your latest offering to the altar of 'embodied' cognition. Frankly, it made me wonder whether you actually read all the things you publish.

Robert Kail, the outgoing editor, had this to say about the ideal Psychological Science paper:
...the ideal Psychological Science manuscript is difficult to define, but easily recognized — the topic is fundamental to the field, the design is elegant, and the findings are breathtaking.
There are a few problems here; 'breathtaking' results have the tendency to be wrong, for example, and while I'm all for elegant design, sometimes, to make a breath taking claim, you need to run those 4 control conditions. But my main problem is less with these criteria and more with the papers that apparently meet them.

I've talked a lot about 'embodied' cognition research here, work I don't think deserves the name and is typically full of flaws anyway. My two main examples have been how thinking about the future or the past makes you sway forward or backwards and how leaning to the left makes the Eiffel Tower seem smaller. Both of these (quite flawed) papers appeared to great fanfare in Psychological Science. In fact, quite a lot of this work appears in Psych Science, to the point where I nearly made 'published in Psych Science' the first thing to check on an embodied cognition paper to know it's not the real deal. So I don't have a lot of confidence in the process of selection at Psych Science to begin with.

This week it got even worse. Kille et al (2012) published a study where they had people sitting on wobbly chairs. They then asked people to rate the stability of some celebrity relationships, and to rate how important they felt several qualities were to a relationship, including several related to stability. People in the wobbly chair condition rated celebrity relationships as less stable than people in the non-wobbly chair condition (effect size (partial eta squared) of .15). They also preferred stability related traits in relationships more (effect size of .1). In both cases, the mean difference was about half a point on a 9 point scale. The authors argue that 'embodiment motivates mate selection preferences'. Oh, and people on wobbly chairs were happier, for which there is no explanation. (I think it means that if you are in a stable relationship already, sit on a wobbly chair and enjoy the happiness bump. I mean, why not?)

When I look to Kail's criteria, I wonder how papers like this one get published.

Embodied cognition is a fundamental topic, but these papers aren't taking it seriously.

The design was elegant, in that they showed a simple pattern of results. But there was no serious discussion of why these results should have happened. There was instead a brief note at the end that said
Indeed, we suspect that one reason cognition may become embodied is to ensure that one's needs - which may arise from physical states - are met through goal pursuit.
Following this logic through, they are claiming that we have evolved to try and resolve temporary postural instability by selecting more stable mates. This makes no sense at all when you spell it out, but it sounds so exciting the way they said it. I guess that's why they said it their way rather than mine.

Are the findings breathtaking? Well, I did get a little breathless after reading this but not in that way Kail was intending (unless he meant from simultaneously laughing and banging my head against the table for 20 straight minutes). But to be serious, these results are meaningless because, like all this research, there is no clear effort to establish the mechanism by which the postural manipulation and the judgement data are connected to one another. Why would a minor and easily corrected postural instability affect the important task of mate selection (or even the judgement task they are using as a proxy for mate selection)? With no clear task analysis of how this extended, embodied cognitive system is formed and why, we cannot interpret these results.

Finally, this is just more 'small effect size' research, which I've previously argued is not as clever as people think it is. There's a perception in psychology that squeezing out a significant result means you ran a clever experiment and managed to defeat the complexity that is human cognition. It's actually a hint that you have asked the wrong question, I think, and I think this because when you manage to ask the right question, you go from tiny effects to unambiguous results. Small effect sizes are not compulsory for psychology; if we get better at asking our questions, we will get clearer answers.

So how did this get published in Psych Science? It's a good question, and one I don't actually know the answer to. If I had to guess, I'd say it's because it's yet another sexy little result in 'embodied' cognition, and sexy sells. But sexy is hurting our discipline - these tiny, astonishing effects aren't being replicated, are probably wrong, and are, to my mind, fairly average science anyway.

What about all the other papers Psych Science publishes?
I pick on the 'embodied' stuff because I know what to look for. There are, of course, good papers in Psych Science. Karen Adolph, my favourite developmental psychologist, recently published some great work about the emergence of walking in infants in this journal, and I'm sure some of the other papers not in my field are good too. But my first reaction to seeing the paper wasn't 'hey Karen, great job', it was 'oh Karen, why did you waste that awesome paper on Psych Science?'. Based on interactions on Twitter I'm not the only person who has come to view what is supposed to be our flagship journal with pity and contempt, even when what it publishes is good.

I'm going to guess that isn't what Kail wanted for the journal, and not what the incoming editor Eric Eich wants either. But until the journal stops publishing all this poorly conceived 'sexy' work, I will never wish Psych Science as a home for any of my papers.

Post Script
Based on the response on Twitter to this post (which ranged from 'I agree! ' to 'damn, that was harsh...but I agree!) I'm not alone in my Psych Science fatigue. The question is then, what do we do about it? It's supposed to be the APS flagship, our premier journal, but we just don't seem that into it. So how can we, the psychological science community, fix it?

References
ResearchBlogging.orgAdolph, K. E., Cole, W. G., Komati, M., Garciaguirre, J. S., Badaly, D., Lingeman, J. M., Chan, G. L. Y, & Sotsky, R. B. (2012). How do you learn to walk? Thousands of steps and dozens of falls per day. Psychological Science, 23(11), 1387-1394.  Download

Kille, D., Forest, A., & Wood, J. (2012). Tall, Dark, and Stable: Embodiment Motivates Mate Selection Preferences Psychological Science DOI: 10.1177/0956797612457392

24 comments:

  1. Well said. I recently had a paper desk-rejected from Psych Science because they were "not convinced that the findings you report represent the sort of trailblazing discovery that we seek". You got to laugh!

    In the end, we published it in PLoS ONE.

    Is Psych Science part of the ongoing replication effort? If so, I wonder how it will fare?

    ReplyDelete
  2. it is part of the replication effort (together with the Journal of Personality and Social Psychology and the Journal of Experimental Psychology: Learning, Memory, & Cognition).

    ReplyDelete
  3. The prediction market doesn't seem to have high confidence in replicability for the selected psych science ones. We'll see how good we are at predicting....

    ReplyDelete
    Replies
    1. I have definately set my money on the JEP:LMC ones. Now we wait and see.

      Delete
    2. I kind of have a similar heuristic. A little scary....

      Delete
  4. Feels good to read this after having to read a bunch of BS high-impact embodied cognition papers by e.g. Schwarz in a seminar.

    ReplyDelete
  5. I agree with your general assessment that focusing on sexy results hurts the field. I never really thought of Psych Science as a particularly bad purveyor of that. Though I don't really read much of the embodied stuff. Other journals jump out at me as much worse in that regard, such as Cognition. So what "good" psych journal would you recomend? Psych Review, JEP:General? or are they all running off a "sexy science" cliff?

    ReplyDelete
    Replies
    1. I'm sure there are other journals doing the same things. Any journal chasing 'sexy' to maintain an impact factor is at risk of this. Psych Science is just the home of most of the embodied nonsense I have to try and find a way to kill one day, so it's on my list.

      I like the JEPs - they are designed to take detailed and methodical explorations of questions. That said, they are under pressure to pick up the pace and get sexier - I've run into their push for 'high impact' work recently (JEP:HPP) and so have colleagues in the field. They need to remain a home for the good stuff, though, we need them.

      Delete
    2. I actually recently reviewed an 'embodied' paper. Had to go back and reread some of your posts.

      Delete
    3. Glad they came in handy :)

      Delete
  6. I cannot agree with you more. Why is it that some of this social psychology or embodied cognition fluff gets in with small effect sizes yet we are all familiar with the "better suited for a specialty journal" garbage tagline for work submitted with large effects? I expect to see large effects reported by a premiere journal and not fluff that makes a great headline for Psychology Tody.

    ReplyDelete
  7. I take all your points except the one on small size effects. Some effects are small and that's it, it does not mean that the hypothesis was wrong... Some small size effect results can have strong societal benefits... Also, one should be careful to not confuse the size of a mean difference and the size of an effect (you can have a 0.5 difference between means and a large size effect and conversely a 4 point difference and a small size effect)...

    ReplyDelete
    Replies
    1. I wasn't conflating the two here, although I also happen to not be all that impressed with a half point change on a 9 point proxy scale for mate selection.

      Read my post on small effect research. My point is that I know big effects are a) possible and b) a sign that you've hit the right question bang on the head. Effects that are small are, in general, not that interesting; what did you have in mind as a counterpoint to that?

      Delete
    2. Some counterpoints are obvious, but they are usually in very applied situations... which this obviously is not. If they had found, for example, that putting wobbly chairs in nursing homes reduced the rate of heart disease, or reduced the prevalence of schizophrenia. (You remember the effect where a mildly vibrating floor helped old people keep their balance better?)

      The problem is, especially in these papers, Psych Science is promoting small effect sizes of no clear importance either to the field or to society. There is no crucial hypothesis being tested, and no implications for any type of practice. Without one of those factors in its favor, you should have to have a hell of a strong effect to get into what a "flagship" journal.

      Delete
    3. (You remember the effect where a mildly vibrating floor helped old people keep their balance better?)
      No - sounds interesting! Do you have a reference?

      Delete
  8. Interesting commentary. I agree on a number of points, including the statement that "breathtaking" findings are more likely to be wrong (e.g., http://wp.me/p1DHGS-2d).

    But I too was puzzled by the comment about small effect sizes. Kille et al. report a group difference of d=0.8 for the mate preference effect - which is typically considered a large effect in psychology.

    In fact, the problem with many flashy studies is that the reported effects are too large, not too small. With N=47, Kille et al could not have found a significant effect unless it was at least medium-to-large. So we don't know if the true effect is really that large, or if the observed effect capitalized on chance.

    I also agree with the anon above. In some research domains, there's a crud factor so a small effect wouldn't be very interesting. But in other domains even a small effect can be interesting, either if it has practical significance or if you have a strong theory that predicts something different.

    ReplyDelete
    Replies
    1. Kille et al. report a group difference of d=0.8 for the mate preference effect
      No they don't; pg 2 has a couple of ANOVAs and the effect sizes are .15 and .1.

      With N=47, Kille et al could not have found a significant effect unless it was at least medium-to-large. So we don't know if the true effect is really that large, or if the observed effect capitalized on chance.
      This is kind of my point. I routinely find meaningful, interpretable and functionally relevant significant effects in my work with Ns of 6-10; in my coordination research I have a successful model and a coherent theory to guide me.

      I am interested though: what small effects are you all thinking of that are actually interesting?

      Delete
    2. d != eta^2. eta^2 and r and d are all on different scales. Judging eta^2 by the standards of d is like saying that somebody whose height is 80cm is "tall" because 80 inches is tall.

      If you go to this page (http://www.uccs.edu/~lbecker/) and plug in the reported means and SDs from the mate-preference analysis, you get d=0.82 or r=.38. r^2 is a variance-explained metric like eta^2, and .38^2 = .144, so it's all consistent -- just different metrics.

      Re your other question, one of the ways that small effects become theoretically or practically interesting is if they get aggregated over many instances. An economic (rather thanpsychological) example is the "house edge" in casinos. In blackjack the correlation between which side of the table you're sitting on (dealer vs. player) and expected gain/loss on a single hand is quite small - the house edge is about 1% (which is a correlation of approximately r=.02) Yet Casinos make billions of dollars by aggregating that advantage over countless instances.

      (A more psychological example of a small effect that is interesting because of aggregation is the Facebook voter turnout experiment: http://www.scientificamerican.com/article.cfm?id=facebook-experiment-found-to-boost-us-voter-turnout)

      Another reason small effects can be interesting is when theory or prior research suggests that the DV is hard to move. An intervention that moved mood by d=0.2 would not be terribly interesting. An intervention that reliably increased intelligence by d=0.2 (if that's the true, unbiased effect size) would be a big deal.

      Along similar lines, consider psi research. The problem with most psi research is inadequate controls, biased reporting, etc. -- not small effect sizes. People weren't looking at the Bem psi study and saying, "Well, he did everything right experimentally but meh, the effects were significant but small." If a robust and convincing design showed a replicable small-but-nonzero precognition effect, I'd indeed find that breathtaking.

      Delete
    3. How does accounting for about 14% of the variance become a large effect??

      Aggregating small effects is fine, when an option; but this small effect on a proxy mate selection task can't aggregate with anything. It stands alone; that makes it not that interesting and, frankly, highly unlikely to be something that actually affects mate selection.

      Delete
  9. Much of the problem you correctly identify within Psych Science is due to the (very) brief report format. For years psychologists have known that psychology papers published in Science seem not to hold up very well. One reason is the lack of detail in the Method sections and the failure to replicate and examine the results in follow-up experiments, as would be required in the Journal of Experimental Psychology or JPSP, for example. Breathtaking results often turn into less surprising results when you've gotten to the bottom of them.

    ReplyDelete
  10. I wonder if 'breathtaking' psychological results should be considered likely wrong because 1) everyone is an 'expert' lay psychologist closely familiar with human behaviour (their own & others) 2) their breath will only be taken by a surprising result 3) so breathtaking results are those that clash with 'expert opinion'...

    ReplyDelete
  11. Nice post. When I first formed my blog, one of my goals was to post a monthly "Most obviously absurd paper from Psych Science". Usually the title is enough to identify the target, though sometimes you have to read the abstract. Maybe this will inspire me to finally do that.

    However, I must say you failed to fully unpack the absurdity of the target article. You said:

    Following this logic through, they are claiming that we have evolved to try and resolve temporary postural instability by selecting more stable mates.

    But you are still letting them get away with "stable." How about:

    They claim that we have evolved to try to resolve the problem of sitting on tilting rocks by selecting mates less likely to cheat on you.

    Now that sounds like the future of our field... trailblazing indeed! ;- )

    ReplyDelete
  12. I'm locking horns right now with the author of another Psych Sci paper, and was just informed by the APS that

    "We strongly encourage authors to share their data, but we do not have an official policy at APS that requires them to do so."

    I feel like APS journals should be boycotted (both in terms of submissions and citations) by the scientific side of psychology, until they clean up their act.

    ReplyDelete
  13. I couldn't agree more... PS is happy to publish correlations as causal relationships if there is a sexy enough "theory" behind it.

    ReplyDelete