Thursday 21 July 2016

Framing the Debate

In 2014 we published a book chapter with Eric Charles in which we argued that the most important thing psychology and neuroscience needed from people like us was a new language in which to talk about the problems we are trying to solve. Our Ecological Representations paper is part of this, and we have a much larger paper in development laying out the more complete set of conceptual tools needed to do ecological psychology across a wider range of problems.

One reason why this is important is a simple fact; we are asking psychology to change and it is up to us to clearly articulate what we want it to change into, or else nothing can happen. A related reason is that without a clear framework, we can't reformulate the questions in a useful way and we're left stuck because we can't explain something like 'theory of mind' because the actual solution is that ToM doesn't exist or need explaining. Ecological neuroscience, for example, will look very different to cognitive neuroscience.

A final reason is that the language in which psychology frames it's understanding of behaviour drives popular understanding of behaviour too. I recently came across my favourite example of this in a tweet by Alice Dreger;
Dreger, for some reason, spends most of her life only using her right eye, even though her left is perfectly functional. She blogged about it here. Every now and again, something makes her left eye kick in and she suddenly has stereo vision.

What caught my eye here is her description of her experience is grounded in the myth that you need two eyes in order to perceive in 3D (I bug my students about this in class every year too). The myth is based in the standard image-based analysis of vision which I'll lay out below; but the point I want to make here is that people still describe their experience of monocular vision as 'not being able to see 3D/depth' even though this is inarguably, demonstrably not what is happening in their visual experience. It's like blind echolocators talking about how the sound creates 'an image in their minds'; this is just not the case, but this is the language psychology has provided them for talking about the perceived experience of spatial layout. What fascinates me is that it's trivial to demonstrate that monocular vision allows for 3D perception, but everyone lets the framing override their own experience. This, to me, is a big part of why our work right now is important - we will never make progress until we can reframe the debate.

Image Based Vision of a 3D world
The standard story about vision that we tell ourselves and the general public is that we have two eyes that work like cameras. They focus light onto the retina to create a retinal image, and these two images must somehow be combined so as to yield a 3D percept. The most detailed version of this story is of course Marr (1982). 

If each eye only takes in an image, then visual perception begins with two slightly offset 2D views of the same scene. Computationally, what you have to do is align these images so that you can match corresponding features in the images. For example, you have to identify that this corner of object 1 is at coordinates (x1,y1) in the left eye and that the same corner is at coordinates (x2,y2) in the right eye. Solving this correspondence problem is a non-trivial task, but there are many elegant and wonderful computational solutions in the machine vision literature. Once you have aligned the two images, however, you can use the disparity to compute the location of each feature in a 3D space; hey presto, 3D vision! With only one image, none of this is possible and you are left with a flat image containing only unreliable cues to distance (such as height in the visual field). 

This analysis leads you to conclude that if you only have one functioning eye, you are unable to recover depth from the image and you are therefore unable to perceive in 3D. This is the story Dreger has heard and uses in her understanding of her experience.

Information Based Vision of a 3D World
Gibson's insight was that vision is not based in images - there is no retinal image anywhere in the visual system. Instead, vision and visual experience are the result of sampling an already structured optic array. This structure is caused by the lawful interactions of light with the surfaces of the world and it is a rich and well behaved source of information about those surfaces. (By well behaved I mean that as you move, this perspective structure changes in smooth, regular ways that reflect your changing status relative to the surfaces in question.) 

Part of the structure of the optic array is inherently about distance and depth. For example, as you move, optical speed smoothly decreases as distance increase. This motion parallex (video demo) is information about relative depth (i.e. perceiving the ordering of things in depth, although not the exact locations). You can then calibrate this information by acting on the world. You can in fact generate enough optic flow to perceive separation in depth with eye movements alone! (Bingham, 1993).

One consequence of this is that you can sample this prestructured optic array in a bunch of ways and still support the same functional behaviour. You can sample it with two eyes with overlapping fields of view, like humans, or with an eye on each side of your head like a rabbit (no overlap). You can even sample it with a compound eye in which the option of an image doesn't even arise (like an insect). And each of these systems can access much of the same information and thus support the same behaviour; in this case, all of these ways of sampling the optic array support the perception of a 3D environment (e.g. Bingham & Stassen, 1994 on monocular vision; Duchon & Warren, 2002 on insects).

There are consequences for having different systems; stereo vision allows some optical structures to be detected that require overlapping fields of view, and some of this is about depth and distance. So it's well established that stereo visual space and layout perception is more stable than monocular viewing (I even found evidence of this in my recent throwing paper). But the point remains that all these systems support some level of 3D perception.

A Simple Demonstration
This is trivial to demonstrate for yourself. Close one eye. Does the scene in front of you suddenly flatten out? Does that picture on your wall suddenly appear to be at the same distance as the coffee cup on your desk? Are you unable to perceive which things are farther away than other things? The answer is 'No'! Monocular vision does not suddenly make the actual 3D world look like an image on a computer monitor where everything really is at the same distance to you. Now, are there differences between the binocular and monocular view? Yes! But this is because you have cut off access to some structural features of the optic array, and not because you have catastrophically lost access to the third spatial dimension, and the data support this latter analysis.

Other examples to keep in mind. If monocular vision prevented 3D vision, prey animals like rabbits would never evolve eyes arranged as they do - the wider field of view would not be worth the trade off. But they do because the trade off is worth it; they can see more and still perceive distance related information (such as time-to-contact of that wolf). Dreger even notes in her blog that she's still able to drive and the clerk at the DMV was entirely unfazed by her problem. So while binocular vision is great, monocular vision is pretty great too and at no point are you not perceiving in 3D.

The framing of the problem matters, and the 'seeing in 3D requires two eyes' error is a great example, because people who know this framing will sit there with one eye closed, experience a 3D world and still tell you they can't see depth because that's how they know how to describe the change in their visual experience. The same goes with blind echolocators talking about the sounds creating 'images in their mind'; nothing of the sort is happening, but this is the vocabulary we have given them.

This also has consequences for the science we do. Instead of looking for how eyes interact with optic flow, we go looking for pictorial cues to depth such as height in the visual field and others, and then construct stories about how we weight and combine these cues to generate a 3D percept. Nothing of the kind is taking place, as is revealed when you ask the right questions (Mon-Williams and Bingham, 2008) but you can't ask the new question without the new vocabulary. Developing this new vocabulary is hard and will take many iterations; Gibson started it and spent 30 years on it, others have been at it for years, and our next round of development is nearly ready to send out only after 6 years hard work on the blog, and even then we still won't be done. But as you can hopefully now appreciate, it's work worth doing and our continued progress as a science depends on getting the framing right.


  1. Thanks for this. As I think I've said, I know I use visual cues to create a 3D map. So I manage to do things like catch balls thrown to me. This is one reason I don't generally "miss" having stereoscopic vision.

    But I can tell you that when my brain is getting vision information from both eyes, it's like the difference between watching a regular TV and a high-definition TV. Everything looks hyper-real -- kind of hilarious, really.

    I'm not sure we are disagreeing. When I see with both eyes, I imagine it must be like the difference between what we see and what bugs/birds that can see more light forms see. It feels hyper-real, and like I said, kind of comical, like a visual joke, because everything looks like a silly movie set to me when I see with two eyes.

    1. Yes, I get that there are real differences, and those are interesting! (And weird for you, I would imagine). My interest here is mainly in the way we talk about monocular vision, and how we don't seem to notice it's not accurate :) So no, I'm not so much disagreeing as using this example to think about the bigger question.

  2. Can you clarify what you mean when you say that it's not true that blind echolocators have an image in their minds (or that the echolocation "creates" an image, if that's the key phrase.) What is happening if there's no image? And could you define how you're using "image" in this case?

    1. To the extent that 'image' has a visual connotation, it makes little sense to me that a congenitally blind person (like Daniel Kish) would create such a thing on the basis of acoustic information. I do think echolocating organisms (people and animals) do have a phenomenological experience; I just bet 10 bucks that 'image' is an inadequate way to characterise that experience.