Wednesday, 9 December 2015

Thoughts on Ding et al (2015) "Cortical tracking of hierarchical linguistic structures in connected speech"

I happened to be reading Cummins (2000) paper “’How does it work?’ vs. ‘What are the laws?’ Two conceptions of psychological explanation”, when my Twitter feed announced that Chomsky was right and we do have grammar in our heads after all. The Twitter buzz concerned a new Nature Neuroscience paper by Ding and colleagues called “Cortical tracking of hierarchical linguistic structures in connected speech.” You can find it online here. Curious whether I needed to completely overhaul my understanding of language, I tracked down the paper and read it this morning. The method employed is sensible, the results are fairly clear, the analyses seem legit (though I’m not a neuroscientist). So, why am I not worried that everything I thought I knew about language is wrong?
The problem isn’t with the effects demonstrated in the paper (those seem fine), the problem is with the explanation for those effects. Luckily, I had just read Cummins, so I have a lot of handy quotes to pull from to illustrate this argument. Below, I’ve stitched together a passage from Cummins (2000) that summarizes the challenge for explanations in psychology:

“A substantial proportion of research effort in experimental psychology isn't expended directly in the explanation business; it is expended in the business of discovering and confirming effects…In psychology, we are overwhelmed with things to explain, and somewhat underwhelmed by things to explain them with…Most journals want reports of experiments. Explanation, such as it is, is relegated to the ‘discussion’ section, which is generally lose and frankly speculative compared to the rest of the paper. Discussion sections are often not read, and their contents are almost never reported in other articles. The lion's share of the effort goes into the experiments and data analysis, not into explaining the effects they uncover…This is not mere tradition or perversity. It derives from a deep-routed uncertainty about what it would take to really explain a psychological effect…It is striking that, while there is an extensive body of doctrine in psychology about the methodology appropriate to the discovery and confirmation of effects, there is next to nothing about how to formulate and test an explanation.”

The question for Ding et al is whether the explanation they propose for their effects is the best or only explanation? 

At the simplest level, what do their results show? They show that cortical activity tracks the phrasal structure of a sentence independent of prosodic information that co-varies with phrasal structure. So, how do the authors explain this effect? They say:

“Our findings demonstrate that processing goes well beyond stimulus-bound analysis: cortical activity is entrained to larger linguistic structures that are, by necessity, internally constructed, based on syntax…[O]ur findings provide unique insights into the neural representation of abstract linguistic structures that are internally constructed on the basis of syntax alone…Although the construction of abstract structures is driven by syntactic analysis, when such structures are built, different aspects of the structure, including semantic information, can be integrated in the neural representation” (p 6)

My summary of their conclusion is that phrasal structure isn’t present in the stimulus, but the brain tracks it so it must be internally generated. The tracking of phrasal structure is based on syntax. Once the structures are “built” semantic information can be added to them.

Is this the best explanation for the observed effects? The best explanation for any set of results is one that captures the causal structure responsible for the effect (Craver, 2007). Such explanations might not be possible in every case because of limitations in our experimental methods for exploring a phenomenon, but this kind of explanation should always be the goal (Cummins, 2000; Craver, 2007, Bechtel & Abrahamsen, 2010). I can’t fault Ding et al for not reaching this bar, but it shows that the explanation they propose is definitively not the best explanation.

The next thing to assess is whether it’s the best current explanation. The best current explanation may not capture the causal structure underlying the effect, but it is helpful to have such causal structure in mind.

As Cummins says,

“Ultimately, of course, a complete theory for a capacity must exhibit the details of the target capacity’s realization in the system (or system type) that has it. Functional analysis of a capacity must eventually terminate in dispositions whose realizations are explicable via analysis of the target system. Failing this, we have no reason to suppose we have analyzed the capacity as it is realized in that system…It is well known that if it is possible to accommodate the data at all, it is possible to accommodate it with a theory that says nothing whatever about the underlying mechanisms or their analysis, i.e., in a way that has no explanatory force whatever (Craig, 1953; Putnam, 1965)” (emphasis added)

This may seem like an odd criticism to bring up in the context of a neuroscience paper that purports to have identified the neural correlates of a hypothesized cognitive process. But, a fully realized explanation of this effect (which, we’ve allowed above, is the best explanation) will account for causal relations between the auditory information containing the sentences and resultant neural activity. At the syllable level we have no problem. Syllables are distinct in the auditory signal and neural activity tracks their frequency. It is reasonable to propose that syllable structure in the auditory signal causes corresponding activation in cortical networks.

Things are more complicated at the phrasal and sentence levels. Ordinarily, prosodic cues in language co-vary with higher-order structures in language. These cues were removed in Ding et al’s stimuli. Thus, they argue, that there is nothing in the signal that is informative about phrasal and sentence structure. This poses a problem for discovering the causal relations between the stimulus and consequent neural activity. If there is really nothing at all in the signal relating to phrasal structure, then nothing in the stimulus can cause brain activity that corresponds to phrasal structure. But, something must be causally responsible for the neural activity corresponding to phrasal and sentence structure. The authors suggest that this something, whatever it is, is generated (read: caused) internally. Okay, fine. But something needs to cause the internal generation and, whatever that is, it must relate to the stimulus since the neural activity corresponding to phrasal structure is locked in time with the auditory presentation of the sentence.

If I were to venture a guess, I would suggest that the authors’ claim about there being no phrasal information in the stimulus is mistaken. I would point out that syntax and semantics are not fully independent and that the chunks of meaning in utterances could stand in for abstract syntactic features like phrases and sentences. But, the authors would explicitly reject this idea, not just because they don’t think that semantics cue structures, but because they think semantics only come into play after the structures have been identified.

“[C]onstruction of abstract structures is driven by syntactic analysis, when such structures are built, different aspects of the structure, including semantic information, can be integrated in the neural representation” (p 6).

Their experiment 7 attempted to control for semantics, to some extent, since the syllables were not meaningful, but still participated in a regular structure. However, the syllables were systematically related to sentence position, which is a stimulus property that could structure neural activity after learning. So, perhaps acoustic information for words (or meaningless syllables in a learned grammar) causes changes to neural activity that, embedded in a particular context where this signal follows some signals and not others, causes in neural activity at the phrase level?

Why should anyone prefer this explanation (which is admittedly pretty shoddy) over the one proposed by Ding et al? The primary benefit of this explanation is that it is consistent with a causal chain of events linking neural activity to purported stimulus properties. Whether this is a correct chain of events is another story. But, in this respect, this explanation is not at a disadvantage compared to Ding et al's. 

At the heart, there is an ambiguity in what this cortical tracking of phrasal features is meant to be doing anyway. Is it the activity responsible for our ability to understand different levels of structure in speech (is it a constituent of language comprehension)? Or, is it the activity that builds the structures which then go on to give us the ability to understand different levels of structure in speech (is it a cause of language comprehension)? In either case, we must ask the question, what causes the activity? If it is internally-generated, what precipitates the internal activity? Presumably, due to the correspondence between cortical activity and stimulus presentation, something external.

This is all just speculation, though it demonstrates the types of questions the authors would need to address to develop the best explanation of the effect of interest. More importantly, though, none of this speculation requires me to invoke anything like an internal grammar. There appears to be no evidence to support the strong claim that “abstract linguistic structures are internally constructed on the basis of syntax alone.” Indeed, syntax (which is an imposed formal description for a property of a messy natural system, anyway) cannot be the cause of constructed syntactically-defined structures.

This is a problem because it suggests that that authors' explanation for their effects are not going to fit well in full, causal mechanistic explanations. In summary, it is definitive that the explanation provided by the authors is not the best explanation for the cortical tracking of phrasal structure (since this explanation will refer to the causal structure responsible for the effect). It is also clear that it is not the only explanation, as I've just rattled some stuff off that seems plausible. But, most importantly, it is also likely not the best current explanation as there isn't a clear path for building a causal explanation out of the one advanced by the authors. 

Bechtel, W. & Abrahamsen, A. (2010). Dynamic mechanistic explanation: Computational modeling of circadian rhythms as an exemplar for cognitive science. Studies in History and Philosophy of Science Part A41(3), 321-333.

Craver, C. F. (2007). Explaining the brain. Oxford: Oxford University Press.

Cummins, R. (2000). How does it work?" versus" what are the laws?": Two conceptions of psychological explanation. Explanation and cognition, 117-144.


  1. The idea that syllables are simply present in the acoustics (or movements) is simply false. Even an account of syllable perception must lean upon shared linguistic knowledge.

    (Disclaimer, no relation of Cummins 2000).

  2. Maybe - this isn't my specialty. I'll read the article you linked to and I might have more to add after that. But for now I'd ask you why you think they found neural tracking at the syllable level for an unfamiliar language?

  3. Thanks for sharing that paper, Fred. I, too, had concerns about Peelle and Davis making strong claims about the role of the speech envelope. Since the speech envelope is an abstraction, I wondered what the actual information was in the signal that supported the entrainment. I also agree completely about your point that the rhythm in language is very different than the periodicity of neural oscillations and that what you refer to as shared knowledge (and what I'd refer to as previous experience shaping the neural response to information) is critical in understanding how our brains respond to language.
    I have a question about this.
    My question is whether regular oscillations that correlate somewhat with the frequency of syllable production (which is admittedly variable) might emerge via the interaction of the brain (which has its own dynamic characteristics) with the ongoing speech signal? Or, could the reported oscillations simply be the result of averaging done during data analysis, meaning that, if we were able to get a good recording of a single trial we might observe a signal that more accurately tracks the temporal characteristics of the speech signal? These two possibilities would suggest that we might observe something like regular activity in the range of syllable frequency without there being unambiguous information about syllables in the speech signal.

    I'll add that I have no skin in this game. I don't think any particular problems arise for us if syllables aren't clearly present in the signal. I took that as a given in the post above because it seemed intuitive and it allowed Ding et al a starting point for their arguments.

  4. Thank you much for the post! You have really saved my lots of time. I’ve just bookmarked this blog site. Karen Kerschmann