The Feasibility of Audio Animation
© Koda 1988-1999
Imagine putting on a pair of headphones and suddenly
finding yourself suspended in the center of a spherical
universe - a world you "see with your ears" - in every
direction around you at once.|
A low, rumbling sound takes on the shape of the planet Saturn; the "sizzeling" rings surrounding it passing harmlessly through your body as you approach the massive sphere. Without turning your head you observe the rings expanding away behind you, as a spacecraft decending from overhead suddenly releases a barrage of "photon torpedos" against the rotating space station far below. You drift toward the surface then pass right through the center of the planet, emerging on the other side where you float buoyantly among the "tinkling" stars. The sound of orchestral instruments appear as 3-dimensional geometric shapes, dancing and spining all around you in a correographed ballet celebrating the birth of a new universe; a universe made entirely of sound.
Imagine being blind; casting away your long white cane to replace it with a pair of "earglasses"; tiny video cameras replacing the usual lenses, with small tubes conducting the sound from the earpieces to the entrance of your ear canals and a miracle of electronics controlling it all from a shirt pocket computer. Imagine how your sightless world, once limited in size to the length of your outstretched hand, has now expanded to include trees and buildings, people moving along sidewalks and cars maneuvering through busy intersections - even clouds. With a flip of a switch only the contrasts between light and dark would be recognized by the video cameras, which could zoom in to read house numbers or street signs.
The idea of looking for some way to create "pictures" with sound first occurred to me in 1971. It became an obsession of sorts, always plaguing my imagination while I found myself stuck within the limitations of being a typical electronic musician. Perhaps you can imagine the difficulty involved in searching for a way to turn the music you hear into pictures you see in your mind. To me it seemed as challenging as inventing the "teleporter" used by the Star Trek crew or discovering anti-gravity.
If you hear a recording of the sound made by a waterfall, you can imagine being near one, but you don't see the size or shape of the falls; the "image" is merely infered. What does a house sound like? How do you hear a ten-foot-square rotating cube? If I ask you to imagine seeing such a cube, and you pause long enough to do so, you can easily perceive a mental image of that cube. The amount of detail in the image is dependent upon the amount of attention you apply to perceiving it, and as any good artist knows, with practice the ability to visualize improves. Mental images are "the stuff of dreams" and anyone who has suddenly awakened from a nightmare can attest to the intense vividness with which such non-tangible images can be perceived.
The task I set before me was to "paint" these mental images with sound: Not merely to infer such images, but to construct them with size and shape and specific detail. After years of very intense head scratching, I finally discovered a way to make it happen.
The theoretical method I have devised is called psychoacoustic animation. (Audio Animation is my trade name for the process.) The "pictures" to be made with this process are referred to as "sound shapes" or Audio Images. You can perform a very simple demonstration, right now, that will result in the creation of a crude Audio Image. It is such a simple concept it seems amazing that no one thought of it before, yet the technology required to produce these effects electronically has only become available in recent times. Ready?
Simply move your arm in a wide circular pattern while continually snapping your fingers. You will notice that you can hear the location where each snap occurs, and you will also recognize that the sounds are arranged in a circular pattern. In other words, you can HEAR A CIRCLE : A "picture" made entirely with sound.
The human auditory system is capable of recognizing both the direction and the distance between ourselves and the sources of the sounds we hear. It does this by analyzing the nature of the sound waves that strike our eardrums. We can pinpoint the source of sounds that come from above us, below us, to the sides or from front and back, all without having to turn our heads. This implies that IF you were capable of moving your hand fast enough, while snapping your fingers several hundred times a second, instead of a circle you could generate a sphere or some other 3-dimensional shape. These 3-D sound shapes could be created behind you, under your feet etc., and you would not need to turn your head to "see" them. It is even possible to create such images inside your head. Of course, no ones' hand can generate sounds that quickly, but a computer can. The question then becomes, "how do you get a computer to generate Audio Images and present them to a listener wearing headphones?"
To begin with, it should be recognized that sounds originating from speakers or headphones can be made to appear to come from somewhere else. Anyone who has spent time listening to music through headphones will be aware of panning effects; where a sound begins in one ear then moves through your head to the other ear. When the music is monitored with loudspeakers the sound can be panned anywhere between the two speakers. This is accomplished by varying the intensity of the sound pressure level, with the sound appearing to originate from a position closer toward the loudest source.
A device you can add to your stereo system, called the Carver Sonic Hologram Projector , causes music originating from two, typically positioned loudspeakers, to appear to come from anywhere in the 180 degree field in front of the listener. It does this by adding varying amounts of time delay to separate frequency ranges of the musical spectrum. Some frequencies are delayed more than others, some are delayed only in the right channel and some only in the left. These effects are also noticeable when using headphones, and include not only creating a larger stereo spread, but also some element of depth positioning of the various instruments heard in the recording.
In the PBS television series "NOVA" a program titled "Artists in the Lab"  shows John Chowning describing the unusual trajectory of a tone "moving beyond the walls" of a room containing four speakers. This computer generated tone is processed to vary the time of the arrival of the tone at each ear, the intensity of the sound pressure at each ear, the amount of reverberation (reflected sound waves), and also incorporates "doppler effect". Doppler effect refers to changes in pitch associated with a moving sound source; i.e., the sound of a train horn appears to increase in pitch as the train approaches a railroad crossing.
Another example of sounds appearing to originate from places other than their physical source is Holophonics . This binaural recording process (binaural meaning 'two ears') captures much of the locational soundwave information which naturally occurs in the environment. It is a relatively simple method involving the placement of small microphones in the ear canals of a dummy head. When recordings made with this method are monitored with headphones, listeners are presented with the illusion of sounds occurring all around their heads. Examples include someone whispering in your ear, or the sound made by a pair of scizzors while getting your hair cut. Other binaural recordings and devices include the Cal Rec Mic System (a British company); specialized headphones invented by the Matsushita company ; and a series of German recordings made by putting microphones inside the head of a cadaver.
It is obvious that sounds originating from headphones (or loudspeakers) can be made to appear to come from somewhere else. This is a result of the way the human auditory system operates. Horizontal localization is primarily a function of four factors: a) intensity differences at each ear, b) the time of frontal wave arrival at each ear, c) the relative phase of the signal at each ear, and d) the amount of frequency content in one ear relative to the other . Auditory depth perception is related to the ratio of the direct energy to the indirect (reflected) energy . Vertical perception involves frequency content in the neighborhood of 8000hz . Just snap your fingers around your head and it becomes obvious that the human auditory system is capable of localizing sounds in the vertical, horizontal and perpendicular planes.
In order to generate Audio Images, the amount of potential resolution (detail in the images) and the method of presenting the sounds to form specific shapes must be understood. Bekesy  discovered that people can separate the location of sounds on the horizontal plane in two degree increments (about two centimeters apart at the forehead). His tests were conducted using an open air spark as the noise source, and modern computer equipment may be capable of generating sounds we can localize down to increments of just one degree. This would allow placing 360 individual points of sound in a horizontal circle surrounding the listener (180 different sounds at two degree increments). The Just Noticeable Difference (JND) between sound positions in terms of depth has not yet been determined, but if you care to experiment yourself, you will see that the JND, moving outward from your face, is in the fraction of an inch range. You will also note that the further away the sound source is, the greater the interval between recognizable differences in location. JNDs in the horizontal plane also expand in size as the sound source moves away from the body, and the same applies to positioning on the vertical plane.
The preceding indicates that small sound shapes could be recognized near ones head, and that further away the shapes must be larger to be perceived clearly. The same is true with visual perception; you can read the date stamped on a penny only when it is near enough to your eyes. Considering that our eyes incorporate billions of individual sensors (rods and cones) and the sense of hearing is limited to the simple in and out movement of two vibrating ear drums, psychoacoustic animation can never be expected to approach the resolution of visual sight, but a four-inch square at arms length should be easily recognized.
The next question concerns how the sounds used to form Audio Images will be presented to a listener wearing headphones. The most logical method would seem to be employing a scanning system similar to that used in video monitors. In this case, the acoustic information would be presented in a rapid series of clicks resulting in a layered construction of the image. One potential problem with this method involves a phenomenon referred to as periodicity pitch . If you cause a tone to be interrupted, say, 1000 times per second, you will hear a "false tone" at a frequency of 1000hz. Generating 1000 sound bursts per second would also produce this same false tone, which could interfere with locational perception. It may be possible to avoid periodicity pitch problems by randomly interuppting the rate of presentation.
Alternate presentation methods include using a constant sound source, such as a musical signal, and then modifying that signal to include the necessary locational information. Another option might be to produce fewer sounds per second, and create a sonar-like "reflected" image. This is an idea similar to reproducing the acoustic "image" which bats and whales derive from bouncing sounds off objects in their environment. Such a "dimensionalized" type of singular image would probably be more difficult for the human auditory system to recognize than a sequential pattern, but it is an option worth further investigation.
There is also a possibility that Audio Images can be presented in a manner which would convey the impression of "color" or "texture". This would be accomplished by using variations in the timbre (tonal quality) of the sounds used to construct the images. If you once again imagine a ten-foot-square rotating cube, perhaps you can imagine each side of the cube having a different timbre. One side could sound like a guitar, another like a piano, etc. I hope to someday establish a set of sound/color guidelines that would standardize the relationships between synthesized tones and corresponding references to color. This would mean that the spectrum of color between violet and red would correspond to the spectrum of waveshape between sine waves (smooth sounds) and square waves (harsh sounds). It would require that both "audio artists" and listeners learn these relationships, but once accomplished, colors intended by the artist could be recognized by listeners. I believe a listener could learn to visualize the appropriate colors subconsciously, with minimal practice, but such concerns are a long way off at present.
Research has demonstrated that the blind have a level of hearing ability which is superior to sighted people in some respects . This improvement in hearing sensitivity is due to the fact that blind persons place more of their conscious attention upon what they hear. This implies that with practice, sighted people are capable of improved hearing, and subtleties in Audio Images might be perceivable which presently exceed our expectations.
The evidence from existing research indicates that psychoacoustic animation is a highly feasible yet significantly complex process. The technology for creating a mobility aid for the blind is even more complicated, but well within our present technological capasities.
In spite of the fact that I have yet to sit behind the wheel of a psychoacoustically animated formula-one race car making 90 degree turns at 200mph while roaring along backwards through a maze of subterranean tunnels with pulsating rock music thundering in my headphones, I am inclined to believe it may not be so long before I am actually able to do that - perhaps while sky diving with a portable compact disk player strapped to my belt, or after dropping a few quarters into an arcade game. Audio Images can be incorporated into motion pictures and broadcast on radio and television. The possibilities are truly amazing.
I have spent a large part of my life working toward being able to create and experience these effects, so perhaps my conviction that this process will work is somewhat less than objective. But if you can create an Audio Image simply by snapping your fingers, it seems difficult to believe that all the resources of modern science could not succeed in pumping the same circle out of a pair of headphones, eventually.
Think about it. Whether it takes two years or two-thousand, can you believe psychoacoustic animation will never become a reality?
Posted September, 1995 - edited December 1999
 Carver, Robert W. "Dimensionalized Sound Producing Apparatus and Method" U.S. Patent # 4,218,585. April 1979. 23 pages, with illustrations.
 Chowning, John M. "Artists in the Lab" Transscript of a PBS television program Nova #817, first broadcast November 15th, 1981. Boston MA.
 "Holophonics": a binaural recording process; a dummy head used in these particular recordings has been patented in Europe by Hugo Zuccarelli.
 Sakamoto, N., Gotoh, T. and Kimura, Y. "On Out of Head Localization in Headphone Listening" Journal of the Audio Engineering Society November, 1976, vol 24, #9, New York.
 Gelfand, Stanely A. Hearing: An Introduction to Psychological and Physicological Acoustics M. Dakker. New York. 1981. 379 pages. Includes bibliographical references and indexes.
 Chowning, John M. "The Simulation of Moving Sound Sources" Journal of the Audio Enginerring Society January, 1971, vol. 19, #1, page 2.
 Ticer, Scott. "Racking the Brain to Create 'Live' Stereo Sound: Psychoacoustics is Helping Hi-Fi Makers Figure Out What Music Is To Our Ears" Business Week July 8th, 1985, page 53.
 Von Bekesy, Georg. Experiments in Hearing Magraw-Hill. 1960. New York. 745 pages, with illustrations.
 see reference # 5.
 Niemeyer, W. and Starlinger, I. "Do The Blind Hear Better? Investigations in Auditory Processing in Cogenital or Early Acquired Blindness" (part two) Audiolgy, 1981, vol. 20, pages 510-515.