Digital Audio
This article explains the effect of sampling (converting an analog signal to digital values) on audio. The basic theory of Analog-to-Digital conversion is described.
An analog signal basically has two major properties: frequency and amplitude. The frequency is the amount of cycles per second. The amplitude is the momentary strength (voltage) of the signal.

An analog audiosignal is a continuously varying electrical current. The process of converting an analog audiosignal to digital values is called AD-conversion.
Sampling
The AD-converter measures the analog signal a number of times per second and converts the measured amplitude to a digital value. This is called sampling. One single digital value is called a sample. Thus, the digital values are a representation of the original analog signal.
 
Sampling frequency
The amount of samples taken per second is called the sampling frequency. The higher the sampling frequency (= the more samples taken per second), the more accurately the analog wave is converted. Also, the sampling frequency determines the highest possible audiofrequency that can be converted. The sampling frequency must be at least twice the highest audiofrequency (the Nyquist frequency). The highest audiofrequency that can be converted with a sampling frequency of 44.1 kHz. is therefor 22.05 kHz. A brickwall filter is used to filter off frequencies from the audiosignal that are higher than half the sampling frequency.
Sample size
A sample consists or a series of bits (a digital word). Each sample has a certain accuracy. The amplitude of the original analog wave is rounded to the nearest digital value. The more digital values there can be, the more accurate the samples are. The more bits used in one sample, the more digital values there are and the more accurate the sample.

This obviously affects the dynamic range of the AD-converter. In theory, each extra bit in a sample adds 6 dB of dynamic range.

Clipping
Since the number of bits per sample determines the dynamic range, there is a fixed maximum and minimum amplitude for the audiosignal that can be converted. The minumum amplitude (silence) would be converted to a digital value of all 0's. The maximum amplitude would be converted to a digital value of all 1's. If the AD-converted would measure a higher input signal, clipping will occur. However, an analog wave is converted best if as much bits as possible are used. In other words: AD-conversion is the most accurate if the input signal uses the full dynamic range.

dBFS scale
The level scale commonly used in the digital domain is dBFS (decibell Full Scale). The highest possible digital value is referred to as 0 dB.
Storing digital audio
Once the audio is digital, it is in fact binary data. This digital data can be stored as a file on a harddisk. The amount of disk space needed to store the audiofile can easily be calculated.
Example:
Sampling rate: 44.1 kHz.
Sample size: 16 bits
Bits per minute: (44,100 x 16) x 60 = 42,336,000 bits
or 5,292 kbyte/minute or 5,3 Mb/minute
The disk space needed for PCM digital audio with a sampling rate of 44.1 kHz. and a sample size of 16 bits is approximately:
mono: 5.3 Mb/minute
stereo: 10.6 Mb/minute
Audio compression techniques
Linear PCM audio (uncompressed) produces quite a high bitstream. In the example above, the bitstream would be (44,100 x 16) 705.6 kbit/s. per channel. In some situations a lower bitstream is desired. For example, audio compression is required to 'fit' the bitstream into the limited bandwith of an ISDN-line (max. 128 kbit/s). Reducing the bitstream can be done with audio compression techniques.
Almost any type of audio compression is based on limitations of the human hearing. The human ear is sometimes unable to hear certain sounds because they are conceiled by other sounds. For example, you cannot hear the sound of a bird up in the sky when there is a train running by, even though the bird is still producing sound. The louder sound of the train conceils the sound of the bird. Also the frequency response of the human ear is certainly not linear. The human ear is much more sensitive for frequencies around 2 to 5 kHz. than for very low or very high frequencies.
Another (more musical) example: A musical piece can have a loud snaredrum and a much softer keyboard. At the moment the drummer hits the snare, your ears become temporarily 'deaf' for the much softer keyboard. This is called masking. The snaredrum masks the keyboard. Your ears recover alsmost instantly.
Several audio compression techniques (MPEG, ATRAC, MUSICAM) are based on using the available bits when and where they are needed most. In the example above, the audio compression codec would use more bits to accurately sample the snaredrum, but would use less bits for the keyboard, since it is masked by the snaredrum and therefor less important at that moment. Most audio compression codecs first divide the audio spectrum up in bands and dynamically assign bits to each frequencyband. The result is a much lower bitstream, depending on the amount of data compression applied.
It is clear that using no audio compression (data reduction) at all is always better. Although many modern audio compression techniques are very sophisticated, they still 'take something out' of the original audio signal. They always affect the audio and take some of the 'transparency' of digital audio away. Depending on the content of the audio, this can be clearly audible. Linear uncompressed PCM with a high sampling rate and sample size is the most accurate way of converting an analog signal to a digital signal.
NOTE: Some elements about digital audio have been simplified in the article above. Common techniques like dither, oversampling and noise-shaping are intentionally left out because they are beyond the scope of this article. |