Comments on Notes from Two Scientific Psychologists: Reply to Graziosi: In detail

Oye! I look at the fascinating discussions that ha...

2016-09-08T14:39:23.575+01:00

Oye! I look at the fascinating discussions that happen when I take a break!

The answer to the question re Shannon is that there are two different types of "error" that are typically not distinguished in these discussions, and that causes much of the tension. One is the statistical equivalent to random measurement error, the other is the other is an error of measure validity. It may be the case that Tau exactly specifies time to impact, but you are measuring it imperfectly (measurement error). That point can't really be denied, as any particular organism in any particular circumstance will have its imperfections, and the ecological psychologist shouldn't be bothered by such a point in the slightest, because that point still leaves Gibson's story perfectly intact. (Especially as the organism can adjust in-process to correct such errors if it stays perceptually engaged.)

It would be a different thing altogether to argue that time-to-impact cannot be perceived, and must only be inferred, based on mental calculations, from imperfectly available "cues". In that case, you still have error of measurement, but there is the much more salient source of error that there is nothing available to the organism which specifies the thing you are interested in, because specification has been a priori deemed impossible. (For a historic example, this line of thinking might be based on arguments about the inherent limitations of the "retinal image".) If we are forever stuck as imperfect perceivers of imperfect cues, then the Gibsonian story falls apart.

There is an in-between point in which one argues for "error" by which we could be attuned to the wrong things. For example, one could acknowledge that Tau specifies time-to-contact, but have an experiment that shows a given organism is flinching as a function of some other optic-pattern. That argument doesn't necessarily break the Gibsonian story, but it takes us into the evolutionary and developmental side of things, which are not as well developed. We would not be, as in the case above FOREVER stuck as imperfect observers of imperfect cues, we are just temporarily in that state, regarding the particular invariants in question.

Most computational models mush the two different types of error together, not considering that there might be excellent theoretical reasons to keep them separate.

(continued 4/4) Thus, the coherent picture sketche...

2016-07-31T16:05:17.186+01:00

(continued 4/4)
Thus, the coherent picture sketched above allows for the potential of doing the same. For example, we might find that treating neurons as logic gates allows to predict the behaviour of actual (biological) neural networks in terms of both internal firing and outputs.

In empirical terms, if we can predict the above, we would have a strong indication that we are indeed honing on the invariant stuff that counts, which, in our new overarching theoretical framework happens to be, surprise surprise, nothing more than EI (the structural difference which makes the ecologically relevant difference)!
In other words, when you say "the nervous system doesn't take information onboard, it resonates to that information" you are right in a way, but it is an unhelpful observation. In the framework I'm proposing, employing the computational lens allows to produce empirical hypotheses on the purely mechanical level (via PP, but also following other more classic frameworks). That's because it focusses on the need that organisms have to systematically extract what counts for conserving homoeostasis, and this can only be taken from regularities found in the environment. Thus, by building a coherent picture which can be viewed through different lenses, we allow empirical cross-checking of the same hypotheses.
For example, let's say we hypothesise organism X in condition Y responds to Tau in respect to Z. You guys can design experiments, manipulate perceivable Tau and see if the hypothesis holds. You can then pass the ball to wet lab people and ask them to open up X (ugh!), and find nervous signals which appear to follow the dynamic properties of Tau as measured above, and see at what level of processing the correspondence is maximal (according to my framework, it would be where input starts becoming output). If they can, they will then be able to pass the ball on theoretical neuroscientists asking them to produce a PP model of the levels involved.
The result would be a complete mechanistic explanation of the observed behaviour, coupled with a synthetic description of the crucial passages. Because of this last step, we would also be able to design a system which behaves in the same way, using whatever mechanisms we may see fit, thus producing a final prediction that "a system which implements the same computations will react to Tau just like our original organism". If also this will work, you'll end up having a complete and really hard to refute picture.
In the process, you would have included and reconciled traditionally opposing factions, which may be a strategic advantage as it doesn't require to prove other people "wrong". (Unlike what you propose in the "Are we Infomation Processers?" post.) If you wish, you may take the bait and remark that I am telling you that you are wrong. Yes, to some extent I am, but only on a limited scope, and I do think that I'm preserving the stuff you really do care about.

Conclusion: the information-centric view has, if I'm right, theoretical, practical and strategic advantages - it is also inescapable, if you wish to communicate your results(!). It does come with plenty of other strategic dangers, but I won't discuss these today (I am sure you can spot them, probably better than I can!). What I'm interested about right now is to hear where I've lost you, and/or if you do / do not see the potential that is getting me over-excited.
As per initial caveat, I'm not sure if any of the above can help you with the paper, so apologies for steering OT...

(continued 3/4) Third, we may observe that (classi...

2016-07-31T15:56:42.201+01:00

(continued 3/4)
Third, we may observe that (classic)SI is "an abstract description of how to reduce uncertainty between a sender and a receiver". But hang on, if EI is perceived, and is (or readily becomes) a representation, isn't it almost equivalent to (classic)SI? To me, yes it is: what is missing in the case of EI is an explicit sender, what is different is that the code used for the transmission is naturally occurring and not designed (it is solely determined by the regularities of the physical world). First consequence confirms something I've written already: this is why you can measure, quantify and textually represent EI, both EI and (classic)SI are part of the same conceptual lineage, so concepts translate well across levels.
Furthermore, we can now say that perceiving EI reduces uncertainty in the receiver. I.e. when I perceive something, I'll know that something else isn't happening. This is important, because it opens up yet another theoretical framework, the one which is prediction-based. In this view, lossy compression is just another name for reduction of uncertainty, and reduction of uncertainty is one and the same as increased prediction power.
The reason why this (tautological) passage is interesting is that prediction-based theories (with Clark, I'll call it "predictive processing" - PP) allow to link back to actual physical mechanisms. At the beginning, we had pSI, solidly linked to physical states. Going up the hierarchy, we kind of lost touch with physical mechanisms, but, for biological machinery, Friston's work allows us interpret it (stuff which has the ultimate function of preserving homoeostasis) in terms of Free Energy and thus prediction/reduction of uncertainty. You can then construct a picture that goes all the way up to explicit knowledge and at each step, PP allows to describe the process in terms of physical mechanisms (actual molecules bouncing around!). At the same time, you'll be showing how the same process is reducing uncertainty more and more, producing (on the flip-side) more and more "abstract" knowledge.

A crucial passage in all this is represented by your paper: it addresses the philosophical dilemma of intentionality, and helps making the important point that cognition is necessarily embodied. It does so by allowing to piece all the elements in a coherent framework. I'm saying that your work is the glue that sticks all of the above together (hence my excitement).

Anyway, the fourth reason is about the computational view. In your own post (linked on top of this comment), you say:
"Behaviour simply is the activity of the kind of embodied system that we are in the presence of that particular information. [...]
Therefore, in the radical embodied, ecological approach, it makes no sense to say that cognition involves information processing."
And you are both right and wrong. As I've argued here, the behaviour of a computer, also is "the activity of that kind of embodied system in the presence of that particular information". Computers are made of mechanisms which shift from one state to the other in lawful ways. Therefore, we can coherently assert that "it makes no sense to say that computers compute". Hang on a second, isn't this absurd? We do look at computers in computational terms, because it's useful... So what exactly makes it useful? The answer once again is lossy compression: in considering computations, we eliminate all the irrelevant physical details and consider only the invariant stuff that counts, i.e., in this case, pure symbolic information.
(continues....)

(continued 2/4) Now consider EI and the distinctio...

2016-07-31T15:50:23.602+01:00

(continued 2/4)
Now consider EI and the distinction between kinetic (out there) and kinematic (perceivable) properties. Like the DNA example above, what matters is that the structural relations are preserved from one into the other. Kinematic properties can be useful only when they lawfully correspond to kinetic properties of the environment. Exactly like DNA representations, they work because they preserve structural relations. Thus, via Landauer, we can measure the SI potential (pSI) for a given physical system (via fundamental physics, we may theoretically enumerate all possible arrangements of the single components in the system), which will happen to represent the upper bound of all information available. Identifying EI then adds ecological constraints about what is perceivable and what is ecologically relevant. In SI+ terms, we look for structures which can make a difference for the biological subject we're investigating (these differences apply both to what is perceivable and what is ecologically relevant) also, do note that structures can be and often are manifest over the time domain, not necessarily over space alone - i.e. kinematic!
So far, we have a theoretical reductionist picture of information encompassing all our definitions. At the bottom, we have potential SI, via Landauer: it is potential because it doesn't imply an observer/decoder. One notch above, we (theoretically) have (almost)SI+, which excludes all the differences that don't make a difference, so it's a bit sterile (if a difference doesn't make any difference, it is by definition undetectable and therefore not a difference that can be accounted for). However, if we go just one step beyond, and look at differences than can make a difference to some other structure/system/organism, lo and behold, we get EI (or, if I may SI+) straight out of the box. A couple of notches upwards, we then encounter "classic" SI, when signals are transmitted between sender and receiver. When the receiver is biological, these signals necessarily pass through a stage supported by EI/SI+ alone, as the receiver needs to be able to perceive them.
You may agree, but you may also reply: "OK, but why do we care?".
Few reasons. You ask about defining Tau in SI terms, and why it may help. Well, if you design an experiment, measure something which allows you (the experimenter) to derive Tau, maybe even manipulate it for experimental reasons, and then check whether the organism studied actually responds to Tau and not something else, you are, willingly or not, already defining Tau in SI terms. If time to contact according to Tau is 3 seconds, and you write it down on your paper, you (not me) are already doing what you're asking me. Thus, I don't need to, you've done the work already.
The importance of all of the above is, to start with, about theoretical coherence. In other words, I'm suggesting a way to stop saying that "[EI] does not mean what most people mean when they talk about information (sorry)". It may not immediately correspond to what lay people mean with Information, but _underlies_ it without exception. Everything we normally refer to as "information" eventually gets implemented (embodied?) in some EI form.
The other reasons are about useful conceptualisations.
First of all, if we look at the proposed reductionist hierarchy, pSI, SI+/EI, [...] and (classic)SI, there is a quantitative gradient: pSI ≥ SI=/EI ≥ [...] ≥ (classic)SI. This directly links to the conceptualisation I'm proposing based on lossy compression. It may or may not be useful, but it directly allows to tap onto works on decision making & control in engineering, meaning that this move allows to expand the pool of theoretical tools we can deploy in empirical investigations. More options and tools, sounds like a positive development to me.
(continues...)

Sabrina and Andrew, I'll try to keep this conv...

2016-07-31T15:46:11.005+01:00

Sabrina and Andrew,
I'll try to keep this conversation in here, to avoid too much fragmentation, after all I am hoping to help, so your blog is where the discussion belongs. (I will however fragment today's reply in multiple comments: it's long!)

As per Andrew's suggestion, I'll try to tackle one point at the time, starting with Shannon's information, my way of reframing it and why I think it may help. One important disclaimer: I've mulled over these matters since I've wrote my first reply, and kept confabulating with more intensity after reading your reply. The main concern I have is that I can't separate my own thinking from your proposals, they are so tantalising close! This however has a dangerous consequence: I cannot find a way of not advertising/pushing my views, which in turn may make my replies less and less useful to you. This wouldn't be the intended effect, so I'll need you both to feel absolutely free to push me back on track, forcefully and explicitly, 'casue when I get excited I might become hard to steer. Agreed? I know it may feel awkward, but please don't be shy.

FI, your paper has been mentioned a few times on Conscious Entities (also by yours truly) here and here, you may want to keep an eye on the latter thread as it might evolve in a lengthy/interesting conversation (covering the philosophical deep-end of our discussion).

On information (Shannon's, lets call it SI, with my addendum SI+). The more I think about it, the more it seems to me that SI+ and Ecological Information (EI) are so closely related to be almost indistinguishable. To see why, I need to make explicit the part played by "structure". My shorthand definition is "SI+ = a structural difference that makes a difference". In the OA, I write that the difference made needs to depend on how stuff is assembled: if assembling the same blocks in different ways makes the result interact with the rest of the world in different ways, then the structural differences (how our medium is assembled) count as informational. Typical example is DNA, for protein synthesis ATG stands for Methionine but TGA is a stop codon. Same ingredients, different structure -> different effects. Crucially, the difference in effects is mechanic, a stop codon does what it does because of molecular interactions, nothing more. However, we can identify the structural differences and by doing so, we make studying DNA easier (we can represent it in terms of sequence, and forget all the details - we deal with informational content of DNA, leaving its embodiment aside, when convenient). In the lab I could plan changes and effects of DNA sequences, using "just" the letters, and then re-translate my plans into real-world DNA to see/measure the actual effects. Manipulating DNA sequences on paper is possible because it preserves the structural relations that happen to make the difference (that is: the difference we happen to care about). During "on paper" manipulations, the symbols A,T,C, and G stand for instances of the actual nucleotides - they are genuine representations, if there ever was one. On paper manipulations of DNA are useful/possible because we can capture what makes a difference and represent it in terms of pure information. We represent structural relations "A, then T, then G" and forget about the actual physical structure. IOW, we focus on the invariant part (the structural relationships) and discard the implementation details. Please keep this concept of "invariant side" in mind as I'll be using it again later on.
(Continues)

The thing I currently do not know is how to charac...

2016-07-25T13:48:31.294+01:00

The thing I currently do not know is how to characterise something like tau in Shannon terms, and what you get from that analysis. The geometrical, ecological analysis points to the specific real structural feature in the optic array that specifies time-to-contact; what does the Shannon analysis buy me beyond that? If someone could actually do such an analysis and show me what it tells me, that would help.

I think we might basically agree on the filtering issue; I just like to try and keep the actual mechanism in mind at all times these days because it often makes a difference. It may not here.

Andrew, (thanks!) I'm with you on the high-le...

2016-07-25T13:37:05.607+01:00

Andrew,
(thanks!)
I'm with you on the high-level functional description on how regularity can drive learning, while white noise can't. I am also with you in saying that "many bits as inputs -> something -> less bits as output" is not (and does not try to be) a full mechanistic explanation of anything.

However, I'm not with you in two important (I think) additional points. I'll frame both as questions.

1. Shannon's info has limits, yes, agreed. That's why I felt the need to push it beyond its original domain. However, because it links to fundamental physics, once pushed a little further, it has the potential of grounding a full mechanistic model. Hence my interest.
If the approach will work, it will allow to avoid having to model each particle in detail and just deal with structural properties (Shannon's info+), which seems like a very welcome possibility.
In this context, I'm struggling to make sense of why the approach can't be used to start modelling perceptual systems. Will we need more ingredients? Sure (EI being my preferred candidate!). However, this need for more "ingredients" applies also to the G & G approach, so what's the catch you are avoiding by sticking exclusively to the latter? [My bet: some catch which applies to classic Cog.Sc., but doesn't apply if one links SI+ to EI straight away.]

2. I guess at one level you do need to identify that there is filtering required in order to get to this analysis; but the Shannon description of the final system remains inappropriate because it implies active filtering remains something the system has to do every time it interacts with an optic array.
Which forces me to ask: in a mechanistic model, what's the difference between active and passive filtering?
On one level it's just stuff interacting, this neuron fires, contacts other neurons and so on. So you just don't find "filtering". If you shift to the S's informational view, you get that the filtering which happens, IS one and the same as responding to (some of) the regularities in the input. If you wish I'm merely repeating the first question form the opposite perspective: what's the insurmountable obstacle we hit by choosing this approach?

Or, let me reassure you: no, I am not implying any special, separate and additional "active filtering". I'm trying to explain why I think that modelling perception can be simplified by deploying the filtering/compression concept in an ad-hoc manner. I'm also saying that doing so shows where the tension between classic Cog.Sc. and the ecological approach comes from, which (if true) would be very good news (to me).

Does the above help a little?

Post where ever suits you best; just link us to an...

2016-07-25T09:59:21.246+01:00

Post where ever suits you best; just link us to anything you write! I think we might be at the point where taking things one at a time might help though; try to clear things in order so we don't spend too much time at cross purposes.

Some quick thoughts: IOW, we have input (some ene...

2016-07-25T09:58:02.316+01:00

Some quick thoughts:

IOW, we have input (some energy patterns coming from the lake - lots of bits, in formal Shannon's terms), device, output. Output is binary, one bit, signifying ripples detected or not. Thus overall, "detecting the signal" is inevitably equivalent to keeping the relevant stuff and disregarding the irrelevant. Detecting the signal is actually the same thing as removing the rest. This is an important concept which I've linked to "lossy compression", and one which applies to decision making as well, so I'd need to make sure you appreciate the point. Do ask if it's unclear or unconvincing.
I get that this is the Shannon style description. But the issue is that this does not describe the causal chain of events that a perceptual system goes through in order to resonate to the information and not the noise. So this (to me) again points to the limits of applying Shannon style analysis to perceptual systems (Sabrina's been interested in overlap for a while, but I'm much more suspicious that it's going to work out).

The ecological analysis comes from EJ Gibson and her work on learning; an early key reference is Gibson & Gibson (1955).

The idea is you have a perceptual system that can learn to resonate to a sufficiently stable signal, but that this learning process takes time. The optic array (for example) present in a given task dynamic is constantly transforming but contains stable higher order relational features (invariants-over-transformation) that are EI information variables created by the lawful interaction of light with surfaces and their properties. Given this set up, the only thing the perceptual system can learn to resonate to are the things that are stable for long enough for the learning, attunement process to take place. The noise hits the system but simply fails to have any long term consequences because it's too short lived.

So while the effect can be described as 'filtering the noise and latching onto the signal' the actual system is not actively filtering the noise; there's no computational overhead, for example. The filtering is kind of embodied in the design of the perceptual system.

I guess at one level you do need to identify that there is filtering required in order to get to this analysis; but the Shannon description of the final system remains inappropriate because it implies active filtering remains something the system has to do every time it interacts with an optic array.

Sabrina, thanks for taking me so seriously! I can&...

2016-07-24T17:17:48.697+01:00

Sabrina,
thanks for taking me so seriously! I can't begin to explain how I find your reply both humbling and gratifying. (I am not deploying standard or empty courtesy, I really am pleased!)

Because of length limits, I'll limit myself to the bird's eye view. Besides being pleased, my immediate reaction to your reply was:
"Oh dear, we really talk and think in different languages, how will we ever understand each other?"

Luckily on subsequent readings I found more and more reasons to be optimistic: there is much we agree on.

Before tackling organisational matters, I'll mention what I think are the main disagreements: "just" two! The aim is to check whether you also recognise what follows as the meat of the argument. If you do, this should help us organise the debate.

Disagreement (D1): you think it's relatively easy to solve the framing problem, I think it is very hard instead.
Disagreement (D2): you don't think approaches based on predictive processing can be of help, I do.

Organisational
(1): we are all very busy, but have a lot of stuff to discuss. My own problem is brainpower: if I rush my replies I know I will be sloppy and either miss the point or just make mistakes. This would aggravate the problem, making us waste time/energy in clarifying misunderstanding or correcting mistakes. Would like to avoid it, but the only solution I can see is to proceed slowly. Means you'll have to wait for my replies. Which might not suit you in case you wish to revise and resubmit relatively quickly while using this discussion.
(2): length limits on comments here are/can be an obstacle. Should I reply on my blog again? (totally ok for me). A part of me thinks we should meet and talk, but might be another organisational nightmare or undesirable for other reasons. I'm London-based, so not unreachable ;-). No strings attached, as usual.

Back on the subject, I think that D2 becomes relevant only if we'll agree that the framing problem is indeed a big problem. Thus, when I'll write my longer reply (or replies? Ouch!) I'll concentrate on D1 unless you'll direct me in other directions.

Today, I can make a technical point and finish with a minor detour.
In your reply, you mention the following a few times (following my own repetitions!):
The point is, no active filter or extraction process needs to be assumed. Let’s let everything in.
Before that, you write:
The cog psy question is whether this measurement device could, in principle, detect the ripple without having to separate the wheat from the chaff (without having to keep the relevant stuff and disregard the irrelevant)?
IOW, we have input (some energy patterns coming from the lake - lots of bits, in formal Shannon's terms), device, output. Output is binary, one bit, signifying ripples detected or not. Thus overall, "detecting the signal" is inevitably equivalent to keeping the relevant stuff and disregarding the irrelevant. Detecting the signal is actually the same thing as removing the rest. This is an important concept which I've linked to "lossy compression", and one which applies to decision making as well, so I'd need to make sure you appreciate the point. Do ask if it's unclear or unconvincing.

On D2, did you stumble on my own two posts on the Bayesian Brain? If not please have a look. The 1st could help with the head-scratching, it may help you see that yes, I'm after (full) mechanistic explanations, nothing else would satisfy me. The 2nd should help because it hints at something I should tackle explicitly: theory building is facilitated by interesting tautologies, unlike empirical verification. This also applies to the problem with "a (structural) diff that makes a diff": yes, it can apply to virtually everything, that is why (I hope) it is useful! Just hints, as there is no space for the full argument.
Enjoy!
Sergio