|Prev: Spiking Neurons||Up: ^ Stereo Mainpage ^||Next: Cyclopean View|
Sampling 3d-space with Coherence-based Stereo
The fusion range of any stereo system is limited. In the human visual system for example, the space around the current fixation point which can be fused, called Panum's area, extents around 15 arcmin (this depends somewhat on the stimulus used to measure it).
As the following diagram shows, this region is a rather small area of three-dimensional space.
There is of course a simple solution to this limited working range of the fusional system - humans and many other animals sample the surrounding space by constantly changing the fixationpoint of their eyes.
Selecting a new fixation point requires two choices: deciding on a view direction, and selecting a corresponding view distance in this direction. Now, the first part is simple, but the second contains a a nice circular problem: we're started the business of stereovision to calculate distances, but arrived now at the conclusion that we have to know the distance of the fixation point before we can start to use our stereosystem, with its limited fusion range.
The solution lies in using low-resolution copies to calculate approximate depths for a vergence system. Reducing the original image sizes by a certain factor reduces the disparities in the stereo pair by the same amount. By appropriate reductions, the range of disparities can be brought into the fusion-range of a small vergence network. The following figure shows an example: it shows disparity maps calculated by a network which had only a small fusion range (4 pixel wide). The disparity range of the original stereo pair clearly exceeded this fusion range (left), but a smaller version could be fused without problems (right):
Interplay of Vergence and Fusion
However: even if a vergence movement was successful and correctly moved the two eyes onto the chosen fixationpoint, it is only guaranteed that this single point was transfered into the fusion range of the network (actually: to zero disparity). Thus, a combination of data from different fixation points needs usually to be done, and it needs to be done carefully, since there is not way of knowing beforehand the size or the shape of the area around the fixation point which will also be fused correctly. Wrong estimates should be singled out before data fusion from different fixation points.
Combination would be simple if a verification measure could be obtained for the disparity estimates. In coherence-based stereo, there is such a verification measure intrinsically available in the algorithm: one can use directly the amount of coherence within a disparity stack as a verification value (more about this at the pages dealing with difficult data or the definition of coherence).
Time for an example! Below, on the left, you see two superposed stereo images after alignment through vergence movements to the tip of the dragon nose.
Updating AccumulatorsFor the combination of the data from various fixation points, accumulators can be used. The combination rule is simple: new data is only inserted if the new verification value exceeds the old one already stored in the accumulator.
Putting it all togetherIt's a simple thing to add an automatic scanning algorithm to complete the whole system; one basically chooses fixation points in image areas where the accumulated verification values are still low. One can stop the algorithm after a preset number of fixation points, or when the verification is everywhere high enough.
Below are some sample runs of the system, obtained with un-calibrated stereo images. Note that evenso the original stereo pairs are badly aligned (in fact: not at all ...), the algorithm is able to calculate the disparity map and the co-registered cyclopean view!
Click on the image to see an animated .gif-file which shows the accumulating disparity data (careful: these are files are large!), on the "Movie"-entry to watch a .mpg-file which displays in addition a lot of intermediate results, or on any of the single numbers to view one time step. Clicking on the disparity map on the right gets you the final result obtained by the system.
For many tasks, like grasping an object or navigating through narrow passages, the co-registered disparity map which is output by the vergence-fusion-system provides sufficient data, evenso it only provides relative depth values. Conversion of these relative depth values into absolute depth values is simple, once the separation between the two camera centers is known. This is of course a fixed value in the human visual system (well, only to a first approximation - the separation changes slightly with eye-rotation!).
In summary, the combined vergence- and fusion-system locks onto the 3d-structure present in the scene, recovers this structure from the two stereo views and (re-)constructs in this way the three-dimensional world.
The vergence algorithm described here is also used in the online-imageprocessing stereo algorithm. Try it with your own images!