Teaching a Machine to See 

To see is to forget the name of the thing one sees.

— Paul Valéry

Computer scientists have discovered that teaching a machine to see is more difficult than they once supposed. Their initial approach was to translate the visual world into key words, such as “tree,” “boat” and “man,” and then get computers to recognize the corresponding objects in their field of vision. Computer scientist Alexei Efros and his colleagues at Berkley’s Artificial Intelligence Research Lab, realized that the process of labelling images was not only labor intensive but also introduced bias into the process. So they developed an algorithm they call “self-supervised learning” that bypasses the labeling altogether. The algorithm learns to see by comparing what it sees with what has been seen before. “The central goal of my research is to use vast amounts of unlabelled the data to understand, model, and recreate the visual world around us,” Efros says.

Self-supervised learning is an attempt to get computers to learn to see the way humans presumably do. A newborn emerges into a world of pure sensation – a “great blooming, buzzing confusion,” as the pioneering psychologist William James once described it. Gradually the infant learns to distinguish light and dark, heat and cold, hunger and thirst, foreground from background. However, sounds are soon associated with these sensations, and words eventually become the concepts that give shape to his surroundings. Naming becomes the small child’s way of gaining mastery over the world.

"The world is presented in a kaleidoscopic flux of impressions which has to be organized by our minds -- and this means largely by the linguistic systems in our minds," wrote the linguist Benjamin Whorf. "We cut nature up, organize it in this way, largely because we are parties to an agreement to organize it in this way -- an agreement that holds throughout our speech community and is codified in the patterns of our language."

"Swifter than light the world converts itself into that thing you name," Emerson wrote in his journal, "and all things find their right place under this new and capricious system." Language turns everything into a gigantic jigsaw puzzle, with all the pieces carefully labeled. Even if we reassemble the pieces into a coherent picture, it's not the underlying unity we see but a pattern of puzzle pieces.

With humans, there is no bypassing the labeling process that Efros and his colleagues determined was ultimately a hindrance to a computer’s learning to see. We may think we are still capable of perceiving the world as it presents itself to us, but the pristine sensations that greeted us as newborns are already lost. The philosopher Owen Barfield has noted “the perceptual world comes over its horizon already organized.” It is not just that we have attached names to sensations but that we have arranged them according to fundamental concepts of time and space that are deeply embedded in the grammar and syntax of our language.

There are undoubtedly evolutionary advantages to being able to attach words to the objects in our visual field — not the least to quickly identify potential predators or prey. And yet, every other species in the animal kingdom is capable of doing this without any verbal assistance whatsoever. The lowly cockroach, for example, has been around for some 280 million years, with a brain containing only about one million neurons, compared with 100 billion for a human brain. A cockroach doesn’t have the brainpower to think about what it is seeing, which may explain why it reacts so quickly to elude even the most determined human predators.

“We quickly forget how to simply see things and substitute our words and our formulas for the things themselves,” wrote the Christian contemplative Thomas Merton in Zen and the Birds of Appetite. Contemplatives and visual artists find they must forget everything they think they know about the world in order to see clearly. For a photographer like myself, learning to see is at once unlearning all we have seen, stripping away all the words and concepts, until nothing remains but pure sensation unencumbered by meaning. It is the world as a newborn might find it, all color, shape, texture, light and shadow — a world without words.

© Copyright 2004-2024 by Eric Rennie
All Rights Reserved