Digital Audio Basics

Mark Henninger's excellent piece about whether or not high-end audio is obsolete got me thinking about a
related topic that's been in the AV news a lot lately: high-resolution audio. As most AVS readers probably
know, LPCM (linear pulse-code modulation) digital audio, the most common type used in commercial music
recording, has two basic parameters that define its resolutionsampling frequency (aka sample rate) and bit
depth.
SACD uses a different type of digital audio called DSD (Direct Stream Digital), which I'll put aside for the
moment. Also, sampling frequency and bit depth don't apply to lossy compressed formats like MP3, only to the
original files that were used to create them, so I won't include them here.
Before I address the question in the title, I'd like to make sure everyone is up to speed on the basics of digital
audio. If you're familiar with those basics, you can skip to the next subhead.
Digital Audio Basics

In all analog-audio electrical signals, the voltage smoothly rises and falls between a minimum and maximum
value in a pattern called the waveform. In most cases, the waveform is quite complex, and it determines the tone
or timbre of the sound it represents.
All complex waveforms can be separated into pure tones whose waveform is a sine wave with a single
frequency and a certain amplitude (the difference in voltage between the peaks and troughs of the sine
waveform); this process is called Fourier analysis. If you were to combine all the pure tones at the proper levels,
you would end up with the original complex waveform. (Actually, it's not as simple as thisthe waveform
varies over time as well, but the basic concept still applies.)
When added together, the sine waves combine to form the complex waveform.
In the process of sampling an analog-audio signal, the instantaneous voltage of the signal is measured multiple
times as it rises and falls during each cycle of the waveform. The number of times this measurement is taken per
second is the sampling frequency. Each measurement, or sample, is represented by a digital number that
includes a certain number of bits; this is the bit depth. The higher the sampling frequency, the more
measurements are taken per second, and the greater the bit depth, the more accurate each of those measurements
is.
As the sampling frequency and bit depth increase, the measurements of the instantaneous level of the analog
waveform become more accurate. Also, as bit depth increases, so does the dynamic range, which is represented
here by depicting the 24-bit samples as taller than the 16-bit sample. The "size" of each bit in all three graphs
remains the same. Note that there are far more than 16 steps in a 16-bit sample; in fact, there are 65,536 steps.
These graphs are meant as conceptual illustration, not to be numerically accurate.
The sampling frequency establishes the highest audio frequency that can be accurately represented in the digital
information. According to a well-established tenet called the Nyquist Theorem, any analog signal with a
frequency no greater than half the sampling frequency can, in principle, be sampled and reconverted back to its
analog form with perfect fidelity. (Remember that most analog-audio signals include many frequencies in
combination, and the Nyquist Theorem applies to each of them individually.)
If frequencies above half the sampling frequencywhich is called the Nyquist frequencyare digitized, they
create lower-frequency artifacts that were not in the original signal. To avoid these so-called aliasing artifacts,
the analog input signal is first sent through a lowpass filter that removes any frequencies above the Nyquist
frequency.
In this diagram, a waveform (red) is sampled at less than twice its frequency, resulting in a low-frequency
aliasing artifact (blue).
When the digital signal is converted back to analog, it first looks like a series of stairsteps that roughly follows
the shape of the original waveform. These stairsteps correspond to the values that were sampled when the signal
was digitized. Such a stairstep waveform includes many high-frequency components, or harmonics, which are
removed by sending the signal through another lowpass filter (also called a reconstruction filter). This removes
all frequencies above the Nyquist frequency, returning the waveform to its original shape. Theoreticallyand
amazinglyno information whatsoever from the original waveform is lost in this process.
A waveform is sampled, converting it into a series of numbers. When the numbers are converted back to analog,
they start as a stairstep approximation of the original waveform. A lowpass reconstruction filter restores the
waveform's original shape.
As I mentioned earlier, the bit depth is the number of bits used to represent each sampled value, from the peaks
to the troughs of the waveform. With a bit depth of 8 bits, 256 different values can be represented (28 = 256);
with 16 bits, 65,536 different values can be represented (216 = 65,536), and with 24 bits, 16,777,216 different
values can be represented (224 = 16,777,216). The bit depth determines the maximum dynamic range that can be
representedeach bit adds roughly 6 dB to the dynamic range, so 16 bits corresponds to a dynamic range of
about 96 dB and 24 bits corresponds to a dynamic range of 144 dB.
Because there is a finite number of possible values for each sample, most samples will not correspond exactly
with the instantaneous voltage of the analog waveform, so the value of the sample is the closest it can be
without exceeding the voltage it represents. The difference between the actual instantaneous voltage and the
sampled value is called the quantization error.
Occasionally, the voltage and the sampled value are precisely equal, in which case the quantization error is 0,
but most of the time, the samples do not equal the voltage by different amounts. So quantization error is often
expressed as an average over many samples. The greater the bit depth, the more accurate all sample values are
and the lower the average quantization error, which is also called quantization noise or distortion. This defines
the noise floor, below which no actual signal can be represented or reproduced.
In this image, the green curve is the original waveform, and the yellow curve is the waveform resulting from
quantization. The red curve represents the quantization errors or quantization noise.
CDs store LPCM digital audio with a sampling frequency of 44.1 kHz and a bit depth of 16 bits, often specified
with the shorthand designation 44.1/16 or 16/44.1. (Many professional digital-audio systems use a sampling
frequency of 48 kHz with a bit depth of 16 bits.) Why were these values chosen? According to most research,
humans can't hear frequencies above 20 kHz, so if the sampling frequency is more than twice that, all audible
frequencies can be accurately represented as digital data.
The dynamic range encompassed by healthy human hearingthat is, the difference between the softest sound
we can perceive (the threshold of hearing) to the loudest sound we can perceive without pain (the threshold of
pain)is around 140 dB, which is more than the theoretical maximum of 96 dB represented by 16 bits. But
using more bits would require more storage capacity, and one of the design goals of the CD format was the
ability to store at least 60 minutes of audio, so 16 bits was deemed sufficient while allowing that goal to be
achieved.
Higher-resolution LPCM audio recordingstypically 24-bit/96 kHz or even 24/192can be distributed on Bluray or DVD-Audio discs or made available for downloading from websites. A bit depth of 24 bits represents a
theoretical dynamic range of about 144 dB, and a sampling rate of 96 kHz can accurately represent frequencies
up to 48 kHz; a sampling rate of 192 kHz can represent frequencies up to 96 kHz.
However, it's important to verify that the original recordings were made at the higher resolution and not
upconverted from 16/44.1, which would negate any potential improvement in the audio quality. Then there's the
issue of an analog master tape being digitized at 24/96 or 24/192, the value of which is debatable, since
professional analog-audio tape has a dynamic range of 60-70 dB without noise reduction.
Another form of high-resolution audio is DSD (Direct Stream Digital), the digital-audio format used on SACD
discs. DSD uses a very high sampling rate of 2.8 MHz and a bit depth of only one bit, but it uses a different
encoding scheme called pulse density modulation (PDM), so it's not directly comparable to LPCM. According
to Wikipedia, it's approximately equivalent to 20-bit/96 kHz LPCM.
There is much more to digital audio than I have explained here, but this is enough to understand the issues of
high-resolution audio and whether or not it is irrelevant.
High-Resolution Audio Goes Mainstream

Recently, digital audio with higher resolution than CD has gotten a lot of attention, especially with the news that
Neil Young is moving ahead with his PonoMusic project now that its Kickstarter crowd-funding campaign has
raised over $6 million. Young proposes to distribute commercial music recorded at a sampling frequency of 96
or even 192 kHz and a bit depth of 24 bits, which will be playable on a portable Pono Player built by high-end
manufacturer Ayre Acoustics.
The Pono Player will include hardware from Ayre Acoustics.
Young is not the first to distribute high-res music files. AIX Records has been recording and distributing 24/96
music files on DVD-Audio and Blu-ray discs since 2000, and it launched itrax.com, the first high-resolution
audio-download site, in the fall of 2007. Chesky Records sells SACDs, DVD-Audio discs, and DVD-ROM
discs with 24/192 audio files that you can copy to a computer hard drive. Other sources of high-res downloads
include Bowers & Wilkins, Linn Records, Naim Label, and 2L. Another well-known source is HDtracks.com,
but some audiophiles suspect that some of its files are upconverted from 16/44.1; see Polk Audio's forum for a
discussion of this.
The crux of the question I pose in the title of this thread is whether or not true high-resolution audiorecorded,
edited, mastered, and distributed in 24/96, 24/192, or DSDoffers an audible improvement over the good ol'
16/44.1 audio found on CDs. And as you might imagine, there is much debate over this proposition.
For example, it seems clear that a bit depth of 24 bits could potentially sound better than 16 bits, since the
dynamic range of human hearing is about 140 dB. But very few recordings are made without some form of
dynamic compression. (I'm not talking about data compression like MP3.) In the case of most popular music,
the dynamic range is severely compressed so that everything can be heard in the presence of road noise in a car
or a city street while you're out for a stroll or bike ride.
In terms of frequency range, traditional research has established that humans can't hear above 20 kHzand
virtually all adults can't hear anywhere near that highso a sampling rate of 44.1 kHz should be more than
enough, especially since the Nyquist Theorem states that all frequencies less than half the sampling frequency
can be reconstructed with perfect accuracy. The problem here is that the anti-aliasing input filter and
reconstruction output filter must have very steep slopes to allow 20 kHz to pass unattenuated while completely
blocking 22.1 kHz and above. This type of "brick-wall" filter is very difficult to design and implement without
introducing some audible artifacts of its ownat least in the analog domain. By using a higher sampling
frequency, the slope of these filters can be much more gradual, which results in much less artifacts.
Then there's the issue of whether or not ultrasonic frequencies above 20 kHz somehow affect the audible range,
even though we can't hear them directly. For example, some believe that ultrasonic harmonics interact with each
other, producing what are called difference or interference tones down in the audible range. So capturing and
reproducing those harmonics could affect the sound we can hear, as many listeners claim they do.
On the other hand, you need some unusually capable equipment to record and reproduce frequencies above 20
kHz. Some speakers can do itfor example, Sony's new Core Series of speakers are spec'd up to 50 kHz for the
SS-CS3 floorstander and SS-CS5 bookshelf, and they're not even that expensive ($480/pair for the SS-CS3,
$220/pair for the SS-CS5); for more info about these speakers, see our coverage here. In fact, Sony is placing a
lot of emphasis on high-resolution audio in many of its new products.
Assuming the ADC (analog-to-digital converter), DAC (digital-to-analog converter), and all digital electronics
in the recording and playback chain are capable of accurately representing 24/96 or higher, what about the other
analog components in the signal chain, including microphones, preamps, and power amps, along with the
analog portions of the converters? If any of them can't support at least 48 kHz and 140 dB of dynamic range, the
effort to record and deliver 24/96 audionot to mention 24/192is moot.
Argument For the Proposition

Aside from Mark Henninger's piece about whether or not high-end audio is obsolete, the article that inspired me
to write this post is "24/192 Music Downloads...and why they make no sense" by Monty Montgomery on
xiph.org. Among the arguments in this article is the assertion that all transducers and amplifiers exhibit some
amount of distortion, which increases at the lowest and highest frequencies. In particular, reproducing ultrasonic
frequencies leads to intermodulation distortion that can extend into the audible range. Thus, it's better not to
encode ultrasonics to avoid any possibility of intermodulation distortion.
Montgomery also points out that, while an analog anti-aliasing filter works better if its slope is gradual as
explained earlier, a digital anti-aliasing filter has no such limitation. If you sample at a high sampling
frequencysay, 96 or 192 kHzyou can apply a digital lowpass filter that simply discards the ultrasonic
components, and you're left with a 44.1 kHz dataset that has no aliasing artifacts.
Regarding the use of 24 bits instead of 16, the article argues that the threshold of hearing increases with age and
hearing damage, and the threshold of pain decreases, reducing the dynamic range of human hearing as we get
older. Also, a technique called dithering, which adds a bit of noise to the signal to mask quantization noise,
allows amplitudes of less than one bit to be encoded and reproduced.
Finally, Montgomery points out that if the loudest possible undistorted sound is defined as 0 dB, the
quantization-noise floor is -96 dB with a bit depth of 16 bits. But this is the RMS noise floor of the entire
broadband signal, and each hair cell in the inner ear is sensitive to a narrow fraction of the total bandwidth,
which means the noise floor of each hair cell is much lower than -96 dB. With the use of dither, the article
claims that the practical dynamic range of a 16-bit digital audio signal is actually more like 120 dB.
The article does acknowledge that using more than 16 bits is important during recording, mixing, and mastering
to avoid clipping and allow digital signal processing without raising the noise floor to objectionable levels. But
once the music is ready to be distributed, there is no reason to use more than 16 bits.
After all this theory, Montgomery cites some empirical tests performed by the Boston Audio Society (BAS) in
which listeners were played high-resolution DVD-Audio and SACD content and the same content
downsampled to 16/44.1 on the spot (no dithering), and they were asked to identify which was which. The tests
were said to be conducted using high-end equipment in noise-isolated environments with both amateur and
trained professional listeners. In over 500 trials, listeners chose correctly 49.8% of the time, which is no better
than random chance.
Argument Against the Proposition

One of the staunchest and longest-active advocates for high-resolution audio is Dr. Mark Waldrep, founder and
chief engineer for AIX Records. Waldrep responds to part of the xiph.org article on his website, realhdaudio.com, in a post entitled "24-Bits Makes Sense!" Waldrep acknowledges that most pop/rock recordings and
some classical and jazz recordings are subjected to dynamic-range compression, and that most commercial
music does not exceed a dynamic range of 96 dB even without compression. But he has, in fact, recorded pieces
that do exceed this dynamic range and thus benefit from 24-bit resolution.
In this graph, you can see the dynamic range of human hearing, a typical room, music, and an analog-audio
signal. (Courtesy RealHD-Audio.com)
When I asked Waldrep about the xiph.org article, he said, "I agree with Monty that we do not derive any sonic
benefit from sample rates higher than 96 kHz. But he's incorrect about the 24-bits claim. His statement that 16bit CDs can deliver more than 96 dB requires some fancy dithering, which no one is actually doing in practice.
CDs have the potential to achieve greater than 90 dB of dynamic range, but why not just shift to 24 bits, since
the hardware and software are already there?"
Waldrep maintains that the CD, which has been around since 1982, is really hard to beat when it comes to
convenience and fidelity. The format has the potential to eclipse analog tape and vinyl LPs, but only if the entire
production chain is up to the task and the engineering/production team are focused on audio fidelity.
He goes on to say that moving to high-resolution PCM audio offers additional fidelity thanks to its increased
specifications. In fact, 24/96 PCM provides an additional octave of frequency response and brings the dynamic
range to the capability of human hearing. The fact that the ultrasonics included in high-resolution audio might
be impossible to hear doesn't deter him.
"It's all about fidelity," he says. "if Wallace Roney is playing his trumpet with a Harmon mute that's outputting
partials well above 20 kHz, and I have microphones and a complete signal path that can capture and reproduce
those frequencies, shouldn't I include them? I'm giving back everything that was being performed. I'm not
willing to arbitrarily roll off the ultrasonics because we haven't proven that humans can't hear them." This might
be an intellectual argument, but there is some evidence that recording at higher than 44.1 or 48 kHz is
perceptible in some way. The jury is still out on that, he says.
I asked Waldrep about the BAS study, and he dismissed it as being completely botched. According to him, "the
examples that were evaluated came primarily from the major labels with a few audiophile recordings as well.
The recordings were either DVD-Audio or SACD discs from the private collections of BAS members. This is
where the issue of provenance becomes important." The term was first applied to the production history of
audio recordings by Waldrep in 2007. "If the original sessions were recorded on analog tape, mixed to a stereo
analog tape, and then mastered to yet another copy, the dynamic range would span about 10-12 bits! How were
the listeners in the BAS study supposed to hear the difference between high-resolution audio and a
downconverted CD version when both had the same dynamic range?
"And the same can be said for the frequency response. Even the new recordings from the Chesky label that were
released on SACD had no frequencies above 20 kHz. The DSD 2.8224 MHz 1-bit format forces all of the 'inband' noise above the upper range of human hearing in a process known as 'noise shaping.' This is the reason
that DSD at higher rates have appearedto push the noise out even further."
The BAS study was so seriously flawed, according to Waldrep, that its conclusions are completely invalid. "If
the listeners were attempting to discern a difference between two things that were essentially identical, of
course the results would be the same as random choice! There weren't any real high-resolution audio titles
among those that were auditioned."
So there you have ithigh-resolution audio recordings are moving into the mainstream, thanks in large part to
advances in recording and playback technology that make it relatively inexpensive to create, distribute, and
reproduce. But does it offer any real, tangible benefit over CD? It's time for you to weigh in with your thoughts,
opinions, and experiences. I look forward to following the discussion.

Digital Audio Basics

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Digital Audio Basics

Загружено:

Авторское право:

Доступные форматы

Mark Henninger's excellent piece about whether or not high-end audio is obsolete got me thinking about a

Digital Audio Basics

High-Resolution Audio Goes Mainstream

The Pono Player will include hardware from Ayre Acoustics.

Argument For the Proposition

Argument Against the Proposition

Вам также может понравиться