Perception is an act of measurement, and, like all acts of measurement, it needs a scale in order to be useful. Think about placing something on your kitchen scales; all that actually happens is that the object presses on the scale and the scale registers that something has changed by some amount in response (the location of a tray, for example). In order to know what that change means, the change is presented to us on a calibrated scale (by moving a needle around to point at some number, for example). The needle always moves the same amount for a given weight but the resulting number can vary (you might have an imperial rather than metric kitchen scale, for example). Without the scale, you can say that one thing is heavier than another by noting that it moves the scale more (this is an ordinal evaluation) but you need the scale in order to say what the weight difference is (the metric evaluation).
Visual perception measures the world in terms of angles; objects subtend a certain number of visual angles that depends on their size, distance, etc. Your thumbnail held at arm's length is about 1° of visual angle. You can get ordinal information directly from angles (the fact that one thing is closer/bigger/etc) but you need a scale to get the metric information required to use vision to control action. For example, you need to perceive how big something actually is in useful units in order to scale your hand size appropriately when grasping it; relative size doesn't help. One of the fundamental questions in (visual) perception research is, therefore, what are the metric units that the perceptual systems use to scale their measurements?
Dennis Proffitt has been studying this question for a long time and is in favour of task-specific, body-scaled units. His evidence comes from studies in which people perceive their environments differently as a function of their ability to act on that environment. Probably the most well-known example is the study that showed people judge hills to be steeper when they are wearing a heavy backpack (Bhalla & Proffitt, 1999). The idea is that the backpack will make traversing that hill more difficult, and when the visual system measures the slope, it scales its measurement in line with this perceived effort. The hypothesis is that this is functional; it's a feature of the visual system that helps us plan appropriate actions.
Perspectives on Psychological Science recently hosted a point-counterpoint debate on this topic. Firestone (2013) reviewed the literature on this type of action-scaling in perception and concluded that not only do the data not really support Proffitt's account, but that this account couldn't work even in principle. Proffitt (2013) rebutted Firestone's arguments and defended his view. I'm interested in this because Proffitt is at least a little ecological, and the basic idea he defends is one I would defend as well (although not in the form that he proposes). So who won?
The paternalism of spatial vision
Firestone begins by framing Proffitt's theory as paternalistic. Proffitt's theory is that vision contains systematic biases that are introduced in order to make us behave in a certain way. These biases are 'well intentioned white lies' that 'bias perceivers towards favourable actions' (Firestone, 2013, p. 456). Proffitt rejects this characterisation as inaccurate; this body scaling is not about lying, it's just about calibration. Although I think Proffitt (and to a greater extent his student, Jessica Witt) do sometimes talk in a way that opens them up to the paternalism label, paternalism is not really a fair label because it implies being misled. Calibration is not a process of distortion, it's a critical part of measurement, and just because the perceived result doesn't match what a physicist might produce doesn't make the perceiver in error (an argument I lay out in detail here).
That aside, let's score the main arguments.
Argument 1: The effect sizes are the wrong size for the job
Firestone's first real argument against Proffitt is about whether or not the action-scaling found in their experiments can possibly be functional. He reviews a range of results and notes that the resulting change in perceived slope (or passability of an aperture, or what have you) is typically quite small, and generally smaller than that actual change that has occurred as a result of the experimental manipulation. So if vision is telling us white lies, they are not very useful white lies because they don't match the change in the world.
Proffitt has three replies; his account is the only one that even predicts these effects should occur in the direction they do (irrelevant), adaptation takes time (true, but weak; see below) and you can see good matching between bias and action in overlearned tasks such as grasping (ok but still weak). In effect he's arguing that calibration has a dynamic (i.e. it occurs over time and in a characteristic way) which is true. The problem is that Proffitt has never studied the dynamics in any detail (see Mon-Williams & Bingham, 2007 for an example of how to do this) which he should have by now if he wants to make this argument. In addition, if they are to be functional, then the adaptation really needs to occur on the timescales he measures. Firestone takes this one because there is critical work yet to be done. Score: 1-0 Firestone.
Argument 2: Action specific units cannot be compared
Firestone notes that if you want to choose how to traverse some distance, action scaling is a problem because the units for walking, running, throwing etc will all be different. If the scales are different, you can't compare the measurements to pick the best option. Proffitt agrees, but notes that this isn't a problem because he never claims the system tries to compare measurements to choose actions and the evidence suggests that the action-scales really are only applied within relevant tasks (what he calls action boundaries).
Action scaling is indeed task-specific and cannot be directly compared, and Proffitt is right that it's not a problem (because action selection is about affordances, not action scaling; I've argued this in some talks on throwing recently that I should really write up). Score 1-1, although note that Proffitt doesn't propose a solution to the problem of action selection; I wouldn't pick on this except that he gets huffy about Firestone critiquing without replacing.
Argument 3: There's no information for ability scaling
In order to apply a metric to a visual measurement, there has to be a relation between these two things that has detectable consequences; there needs to be information about the scale as well as the measurement. Firestone discusses eye-height scaling of object sizes. The horizon always cuts objects at eye-height, regardless of distance. Some simple geometry means that because the visual system has access to the angular size of the object and the angular size of the part below the horizon, then it has access to the ratio (the size of the object in eye-height units). Eye height is a viable scale because it has visually detectable consequences; action scales such as walkability or jumpability do not, and therefore cannot be viable scales.
Proffitt appeals to dynamics again; calibration takes time and purposeful behaviour (you act, perceive the consequences and correct the errors). He also highlights that eye-height scaling, while a nice simple example, doesn't actually seem to get used much anyway.
Proffitt misses a couple of key points; calibration requires information, so this is a problem for him. Of course, there are non-visual sources of information that might solve the problem (Firestone only talks about vision). Firestone gets the point for being basically right and for Proffitt not paying enough attention to the critique. Score: 2-1 Firestone.
Argument 4: Visual space doesn't look like it's warping
Firestone says that if vision is distorting space to help guide action, we should experience this warping (because some of the effects can be quite large, contra Argument 1). But we don't. Proffitt has three replies. First, he states that vision is trying to provide stable access to the environment, so lots of perturbations are filtered by the system. As an example, he notes the fact that our view of the world does not whizz around as we saccade our eyes three times a second because of saccadic suppression. He then notes some work which found that we don't notice even when a (virtual) environment really is shrinking and growing (Glennerster et al, 2006). Finally he notes that changing which ruler you're using doesn't actually change the locations of objects in space. Think about measuring a gap in centimetres, then switching to an inch ruler. The distance is the same, the number has changed, but this only matters if you keep acting as if you still used a centimetre ruler. In the same way, people use the current calibration, not the previous one.
Proffitt wins this one across the board. Appeals to visual experience never help in arguments about how vision actually does what it does. While we do perceive the world, we do so by detecting information, and we have no real access to the experience of detecting information per se. More importantly, however, is Proffitt's point about how all that's changing is the ruler. We have no privileged access to the world; all we 'know' is what the calibrated detection of information tells us, and different calibrations are incompatible (see Argument 2) which means there's no way to compare them and identify a difference. As Firestone notes in Argument 3, you need information to detect everything, and this applies to him as much as to Proffitt. Final Score: 2-2.
Tiebreak: Everyone loses
I really want to like Proffitt's work. His heart is in the right place, after all; perception really is scaled in task-specific action units. But his work only ever scratches the surface and rarely deeply enough to justify his conclusions. He needs task dynamics, he needs to frame his work in terms of calibration and he is (as Firestone rightly points out) in desperate need of some information to back this all up.
Given this, I really wanted Firestone's critique to have the requisite substance, because a solid review of this literature and an analysis of what's missing would be a valuable contribution to the literature. But he really only has one major point (about information) and while he's right to say Proffitt suffers here, he's wrong to say action scaling can never have informational consequences. It might not have visual consequences the way eye-height does, but that's not the point.
So really the losers here are us, the readers. Instead of a substantive analysis and defence of a problematic but on-the-right-track theory of perceptual scaling, we got a mixed bag of viable points generally poorly defended and hidden in amongst some irrelevant information and a surprising amount of snark. I'm all for being feisty and punchy when it's called for but everyone just seemed a bit pushy throughout which made this exchange less productive that it could have been. Full marks to Perspectives, though, for allowing the debate and successfully negotiating what I'm sure was a complicated review process.
Bhalla M. & Proffitt D.R. (1999). Visual-motor recalibration in geographical slant perception., Journal of Experimental Psychology: Human Perception and Performance, 25 (4) 1076-1096. DOI: 10.1037//0096-1518.104.22.1686
Firestone C. (2013). How "Paternalistic" Is Spatial Perception? Why Wearing a Heavy Backpack Doesn't--and Couldn't--Make Hills Look Steeper, Perspectives on Psychological Science, 8 (4) 455-473. DOI: 10.1177/1745691613489835
A., Tcheang L., Gilson S.J., Fitzgibbon A.W. & Parker A.J. (2006).
Humans Ignore Motion and Stereo Cues in Favor of a Fictional Stable
World, Current Biology, 16 (4) 428-432. DOI: 10.1016/j.cub.2006.01.019
Mon-Williams M. & Bingham G.P. (2007). Calibrating reach distance to visual targets., Journal of Experimental Psychology: Human Perception and Performance, 33 (3) 645-656. DOI: 10.1037/0096-1522.214.171.1245 Download
Proffitt D.R. (2013). An Embodied Approach to Perception: By What Units Are Visual Perceptions Scaled?, Perspectives on Psychological Science, 8 (4) 474-483. DOI: 10.1177/1745691613489837