Back to the index page
In reading research the moving mask and moving window paradigms have proven to be invaluable in determining the chronometric and spatial characteristics of processing written text. The success of these methods has lead to a demand for their application in research on real-world scene perception. However, we will argue that the technical implementation of eye-contingent mask (window) movement across a stable text cannot be applied to scene research. A new technique is proposed that allows graphical masks or windows of arbitrary form, size and content to be moved quickly over a complex graphical stimulus. This moving overlay technique makes use of the ATVista Graphics Adapter, a board with the ability to mix an internally stored and an externally generated image into one composite image. By programming the internal image to be moveable and partly transparent, a high-performance moving mask or window is created. The technique is implemented on a standard PC interfaced with an eyetracker, thus bringing mask (window) movement under on-line eye-movement control. We discuss general principles of the technique and illustrate them with performance-data from a concrete experimental setup.
The moving mask paradigm was applied to scene perception, to determine whether there exists a fixed, privileged period for foveal information extraction at the beginning of each fixation. Subjects freely explored line drawings of realistic scenes in the context of a search task during which their eye movements were recorded. During each fixation, foveal information was masked whenever the fixation lasted longer than a preset mask onset delay. Scene inspection time was used as a global measure of task difficulty. An asymptotic masking curve was observed, with longer mask onset delays producing a decrease in task difficulty, asymptoting to base level performance at onset delays of 45-75 ms. This suggests that most of the foveal scene information can be encoded within this interval. Contrary to earlier findings in reading research, mean fixation durations, a local measure of encoding difficulty, did not increase when the mask onset delay decreased. An analysis of fixation duration distributions, however, did reveal a time-locked effect of mask onset on fixation duration. These findings are discussed in terms of their implications for estimates of the chronometry of visual information encoding in scene perception.
Keywords: Foveal Masking, Scene Perception, Fixation Duration.
Moving mask and window paradigms are used to study the spatial and temporal aspects of visual information processing. Due to technical limitations, these paradigms have frequently been applied to reading, but only rarely to scene perception. Existing moving mask or moving window techniques for graphical stimuli usually blank the display inside or outside a square window. A new moving window technique is presented here that uses a custom-built video switcher and three synchronized video boards. The first video board contains the stimulus that is presented inside the moving window. The second video board contains the stimulus to be presented outside the window. The third video board contains a black-and-white image of the window that is used as a key signal for the video switcher. The video switcher selects between the video signals of the first and the second video board on a pixel-by-pixel basis, controlled by the key signal generated by the third video board. By panning the image of the third video board, the window can be moved very rapidly. Presently we are using oval windows, centered on the fixation spot as measured by an eye-tracker. The normal stimulus is visible inside the window, whereas manipulated information is presented outside the window, or vice versa.
Two experiments are reported that employed eye movement contingent display changes to investigate the effects of foveal image degradation on scene perception. Stimulus information was degraded within an ovoid window that was aligned with the fixation position, while outside of the window, an undegraded line drawing of a real-world scene was visible. Degradation types included: full masking by noise, partial masking by noise, and reduction of image contrast. Image degradation occurred after a preset delay from the onset of each fixation, ranging from 0 to 105 ms. Manipulation of the degradation onset delay showed that, in the context of an object-decision task, sufficient foveal information could be encoded within the initial 50 ms of fixations. Saccade programming, however, was affected regardless of the degradation delay. Fixation locations were selected preferably from undegraded stimulus areas. The results support a local-to-global hierarchy of stimulus utilization.
Five experiments compared foveal and peripheral information acquisition during scene perception. Stimulus information was degraded foveally or peripherally during predetermined periods of fixations, using an eye movement contingent display-change technique. It was shown that foveal encoding for local stimulus identification generally occurred at the beginning of fixations, and was completed within 45 to 70 ms. Foveal encoding however continued for other perceptual processes, that presumably operated on the global stimulus. No clearly defined interval appeared to be associated with information acquisition for these processes. They apparently could start at variable moments, but occurred infrequently during the early part of fixations. Generally, a local-to-global sequence of processing seemed to be present.
In the present experiment, participants were exploring line drawings of scenes in the context of an object-decision task, while eye-contingent display changes manipulated the appearance of the foveal part of the image. Foveal information was replaced by an ovoid noise mask for 83 ms, after a preset delay of 15, 35, 60, or 85 ms following the onset of fixations. In control conditions, a red ellipse appeared for 83 ms, centered around the fixation position, after the same delays as in the noise-mask conditions. It was found that scene exploration was hampered especially when foveal masking occurred early during fixations, replicating earlier findings. Furthermore, fixation durations were shown to increase linearly as the mask delay decreased, which validates the fixation duration as a measure of perceptual processing speed.
The time course of foveal and peripheral information encoding during fixations on scenes (real-world situations) was studied, using eye-contingent display changes. The presence of foveal or peripheral information was manipulated in function of the time elapsed since the onset of fixations. Foveal encoding for object identification was shown to occur during the initial 50 ms of fixations. No clearly defined fixation period appeared to be associated with the use of peripheral information.
Bij het bekijken van een scène (een alledaagse situatie, zoals een keuken of een speeltuin), fixeren mensen vooral de informatie-rijke gebieden, zoals objectlocaties. Gedurende saccades van de ene naar de andere fixatieplaats wordt praktisch gesproken geen informatie van de scène opgenomen. Tijdens een fixatie wordt de foveale informatie (het gefixeerde voorwerp) verwerkt. Bovendien moet gedurende de fixatie een nieuw (extrafoveaal) saccadedoel bepaald worden, en het bijbehorende motorprogramma opgesteld worden. Binnen het leesonderzoek zijn bruikbare paradigma's ontwikkeld, met name de 'moving mask' en 'moving window', om de spatiotemporele kenmerken van deze processen te bestuderen. Twee nieuw beeldveranderingstechnieken maken de toepassing van beide paradigma's in scèneperceptie-onderzoek mogelijk. In een eerste experiment werd de fovea volledig gemaskeerd na een gemanipuleerde vertraging. Foveale informatie bleek binnen 45 tot 75 ms opgenomen te kunnen worden. Fixatieduren werden echter niet beïnvloed door de maskering. In een volgend experiment werd de foveale stimulus in contrast verlaagd, waardoor het zinvoller was om een fixatie te laten voortduren. Foveale contrastverlaging bleek de scèneperceptie echter vrijwel niet te hinderen, vermoedelijk door de vorming van een iconische representatie van de foveale stimulus, voordat de contrastverlaging plaatsvond. Wanneer in een derde experiment foveaal echter slechts een gedeelte van alle beeldpunten gemaskeerd werd, bleek bij korte vertragingen de fixatieduur wel te stijgen.
While viewing a scene (a real-world situation, such as a kitchen or a playground), people mainly fixate information-rich areas, such as object locations. During saccades from one fixation spot to another, no practically useful information is extracted from the scene. During a fixation, foveal information (the fixated object) is being processed. Moreover, a new (extrafoveal) saccade target is selected during the fixation, and the accessory motor-program created. In reading research, useful paradigms have been developed to study the spatiotemporal characteristics of these processes, especially the 'moving mask' and 'moving window' paradigms. Two new display change techniques enable the use of these paradigms in scene-perception research. A first experiment completely masked the fovea after a manipulated delay. Foveal information apparently could be extracted within 45-75 ms. Fixation durations however were not affected by the mask. In a next experiment the foveal contrast was decreased, by which it was more sensible to maintain a fixation. Nonetheless, decreased foveal contrast did hardly disturb scene perception. Presumably an iconic representation of the foveal stimulus could be formed before the contrast was lowered. When in a third experiment only a part of all foveal pixels were masked, fixation durations finally appeared to increase at short mask delays.
A set of line drawings, which has been used extensively in perception research of the Laboratory, was digitized to a new image library. The new library includes several important improvements: Line drawings were digitized at a much higher resolution, illumination of the images was more uniform, and a software toolbox was developed which utilizes the "transparent mode" of the ATVista Videographics Adapter to simplify the construction of line-drawing stimuli.
During scene perception, the human visual system differentially uses coarse and fine visual information. Coarse (low frequency) information plays a main role in initial scene identification, whereas fine (high frequency) information is required for object identification. Scene exploration is characterized by an object-to-object scanning pattern to bring high frequency information into foveal vision. In the present study we address whether this implies that during scene exploration, high frequency information is processed exclusively in foveal vision. Subjects freely explored realistic grey-scale scenes in the context of a search task. Eye movements were tracked to position a moving window of elliptical shape at the point of fixation. Within the window, the normal, full frequency version of the scene was presented. Outside the window, either a low-pass or high-pass filtered version of the same scene was displayed. A control condition was included where the entire stimulus was displayed in the full frequency version, but the window was outlined by a white ellipse. The moving window was generated by three video boards connected to a user-designed video switcher. Mean window repositioning delay after the onset of a fixation was 15 ms. Window size was 6 x 4.6 degrees, whereas the complete stimulus subtended 16 x 12 degrees. The results showed that scene exploration benefits more from high frequency peripheral information compared to low frequency information.
In a previous moving window study it was found that scene exploration benefits more from peripheral information of high spatial frequency than of low spatial frequency. In the present study, degraded versions of realistic full-colour scenes were presented peripherally during the initial 150 ms of fixations, while foveally the undegraded scene was presented. The undegraded version of the scene was visible both foveally and peripherally during the later part of fixations. During the initial 150 ms, the peripheral part of scenes was lowpass, bandpass, or highpass filtered, blanked, or decreased in luminance. In a no-change condition, the undegraded scene was presented throughout the whole fixation. Subjects freely explored the scenes in the context of an object-decision task. The results indicated that, to some extent, peripheral information is utilized during the initial 150 ms of fixations. However, no preference for a specific spatial-frequency range was found.
A number of studies indicated that the human visual system differentially uses coarse (low frequency) and fine (high frequency) information in peripheral vision. In two previous moving window studies, we tried to determine which kind of peripheral information is used at which moment during a fixation. Subjects explored drawings of realistic scenes in the context of a search task. Their eye movements were recorded to position an elliptical window at the fixation point. Inside the window the unmanipulated scene version was presented. Outside the window (i.e. peripherally) information was degraded during a certain period. The degradation included highpass, lowpass and bandpass filtering. In the first experiment, peripheral information was degraded during the initial part of a fixation (i.e. during the first 100 ms). Unmanipulated scene information was available during the final part of a fixation. No significant differences were found between the filtered versions, suggesting that peripheral information is mainly needed during the final part of a fixation in order to choose a new fixation point. In the second experiment, the periphery was therefore degraded during the final part of each fixation (i.e. after 100 ms.). There were again no significant differences between filter conditions. This lack of effect was explainded in terms of temporal integration of information since spatial frequency information was simply removed from the image rather than masked. In the present experiment we used the same paradigm but tried to circumvent temporal information integration. Instead of removing a certain spatial frequency range after 100 ms, we replaced it by noise. The information of other spatial frequencies was maintained. Several eye movement parameters (e.g. fixation duration, saccadic amplitude) were examined to see which spatial frequencies were most useful during the later part of a fixation. High spatial frequencies seem to be most beneficial.
In Crane and Steel (1978) it is described that the saccade data collected with a Dual Purkinje Image Eyetracker show a post-saccadic overshoot. It is claimed that the overshoot is generated by lens movement. The analysis of the data of experiments with the DPI Eyetracker shows major interindividual differences in occurrence, size and patterns of the overshoots. The overshoot in the data causes problems for DPI Eyetracker users: How can the beginning of a fixation be determined? This becomes a major problem when saccade durations and fixation durations must be compared with results from eyetrackers using a different measurement principle. But also for doing eye-movement contingent display changes, one needs to be sure whether the saccadic movement is still going on, or whether and when the new fixation starts. The results of a small technical experiment, comparing artificial-eye results with the data from a human subject, show that the overshoot is not due to machine artifacts, but is the result of characteristics of the human eye. In a second experiment we will try to show a relationship between saccade amplitude and overshoot size and duration, and an interaction with accomodation. The importance and implications of lens movements will be discussed: The lens movement has consequences for determining when 'vision' starts after the saccade, i.e., when saccadic suppression stops; a moving eye lens also means a moving retinal image.
Crane, H. D., & Steele, C. M. (1978). Accurate three-dimensional eyetracker. Applied Optics, 17, 691-705.