The Vocoder
A vocoder ( /ˈvoʊkoʊdər/, short for voice encoder) is an analysis/synthesis system, used to reproduce human speech. In the encoder, the input is passed through a multiband filter, each band is passed through an envelope follower, and the control signals from the envelope followers are communicated to the decoder. The decoder applies these (amplitude) control signals to corresponding filters in the synthesizer. Since the control signals change only slowly compared to the original speech waveform, the bandwidth required to transmit speech can be reduced. This allows more speech channels to share a radio circuit or submarine cable. By encoding the control signals, voice transmission can be secured against interception.
The vocoder was originally developed as a speech coder for telecommunications applications in the 1930s, the idea being to code speech for transmission. Transmitting the parameters of a speech model instead of a digitized representation of the speech waveform saves bandwidth in the communication channel; the parameters of the model change relatively slowly, compared to the changes in the speech waveform that they describe. Its primary use in this fashion is for secure radio communication, where voice has to be encrypted and then transmitted. The advantage of this method of "encryption" is that no 'signal' is sent, but rather envelopes of the bandpass filters.
Analog vocoders typically analyze an incoming signal by splitting the signal into a number of tuned frequency bands or ranges. A modulator and carrier signal are sent through a series of these tuned band pass filters. In the example of a typical robot voice the modulator is a microphone and the carrier is noise or a sawtooth waveform. There are usually between 8 and 20 bands.
The amplitude of the modulator for each of the individual analysis bands generates a voltage that is used to control amplifiers for each of the corresponding carrier bands. The result is that frequency components of the modulating signal are mapped onto the carrier signal as discrete amplitude changes in each of the frequency bands.
Often there is an unvoiced band or sibilance channel. This is for frequencies outside of analysis bands for typical speech but still important in speech. Examples are words that start with the letters s, f, ch or any other sibilant sound. These can be mixed with the carrier output to increase clarity. The result is recognizable speech, although somewhat "mechanical" sounding. Vocoders also often include a second system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency.
SIGSALY (1943-1946) speech encipherment systemHY-2 Vocoder (designed in 1961), was the last generation of channel vocoder in the US.
The first experiments with a vocoder were conducted in 1928 by Bell Labs engineer Homer Dudley, who was granted a patent for it on March 21, 1939. The Voder (Voice Operating Demonstrator), was introduced to the public at the AT&T building at the 1939-1940 New York World's Fair. The Voder consisted of a series of manually-controlled oscillators, filters, and a noise source. The filters were controlled by a set of keys and a foot pedal to convert the hisses and tones into vowels, consonants, and inflections. This was a complex machine to operate, but with a skilled operator could produce recognizable speech.
Dudley's vocoder was used in the SIGSALY system, which was built by Bell Labs engineers in 1943. SIGSALY was used for encrypted high-level voice communications during World War II. Later work in this field has been conducted by James Flanagan.
VOCODER Applications:
Terminal equipment for Digital Mobile Radio (DMR) based systems.
Digital Trunking
DMR TDMA
Digital Voice Scrambling and Encryption
Digital WLL
Voice Storage and Playback Systems
Messaging Systems
VoIP Systems
Voice Pagers
Regenerative Digital Voice Repeaters
Modern vocoder implementations
Even with the need to record several frequencies, and the additional unvoiced sounds, the compression of the vocoder system is impressive. Standard speech-recording systems capture frequencies from about 500 Hz to 3400 Hz, where most of the frequencies used in speech lie, typically using a sampling rate of 8 kHz (slightly greater than the Nyquist rate). The sampling resolution is typically at least 12 or more bits per sample resolution (16 is standard), for a final data rate in the range of 96-128 kbit/s. However, a good vocoder can provide a reasonable good simulation of voice with as little as 2.4 kbit/s of data.
'Toll Quality' voice coders, such as ITU G.729, are used in many telephone networks. G.729 in particular has a final data rate of 8 kbit/s with superb voice quality. G.723 achieves slightly worse quality at data rates of 5.3 kbit/s and 6.4 kbit/s. Many voice systems use even lower data rates, but below 5 kbit/s voice quality begins to drop rapidly.
Several vocoder systems are used in NSA encryption systems:
LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding
Code-excited linear prediction (CELP), 2400 and 4800 bit/s, Federal Standard 1016, used in STU-III
Continuously variable slope delta modulation (CVSD), 16 kbit/s, used in wide band encryptors such as the KY-57.
Mixed-excitation linear prediction (MELP), MIL STD 3005, 2400 bit/s, used in the Future Narrowband Digital Terminal FNBDT, NSA s 21st century secure telephone.
Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32 kbit/s used in STE secure telephone
(ADPCM is not a proper vocoder but rather a waveform codec. ITU has gathered G.721 along with some other ADPCM codecs into G.726.)
Vocoders are also currently used in developing psychophysics, linguistics, computational neuroscience and cochlear implant research.
Modern vocoders that are used in communication equipment and in voice storage devices today are based on the following algorithms:
Algebraic code-excited linear prediction (ACELP 4.7 kbit/s – 24 kbit/s)
Mixed-excitation linear prediction (MELPe 2400, 1200 and 600 bit/s)
Multi-band excitation (AMBE 2000 bit/s – 9600 bit/s)
Sinusoidal-Pulsed Representation (SPR 300 bit/s – 4800 bit/s)
Tri-Wave Excited Linear Prediction (TWELP 600 bit/s – 9600 bit/s)
There are musical Synthesizer properties and applications too, however, we will not be getting into this because this is about radio

© Amateur Radio Station W5TXR and W5TXR 2012 All Rights Reserved All references to "Amateur Radio Station W5TXR" and "W5TXR" the Amateur Radio Station W5TXR logo are registered trade marks ®.

