Daniel Knighten
VP, Product Development
#audiotesting #audiotest

In Defense of THD+N

“Measurements, not models!” was a statement an old boss of mine would make periodically when discussing various attempts over the years to popularize new audio metrics based on algorithms implementing psychoacoustic models. Total Harmonic Distortion and Noise (THD+N) and the closely related Total Harmonic Distortion (THD) metrics, are not and cannot be the single point of reference for the quality of an audio recording or reproduction system, but they have many virtues and some key advantages compared to metrics which return a mean opinion score (MOS) or other unitless metric of “goodness”.

At a glance and in a single value, THD+N represents the linearity of a system by the percentage or ratio of the audio energy in a system which is undesirable. The lower the noise and distortion in a system, the more linear a system is and the greater the fidelity to the original signal. Outside of guitar pedals and effects processors, a lower THD+N value is always better.

A common and valid complaint about THD+N is that it cannot tell you what a system will sound like to a human listener. Yes, a system with lower distortion and noise will generally sound better when scored by human listeners, but research seems to demonstrate a thresholding function, listeners cannot differentiate between systems that have less than a certain amount of THD+N or THD. In addition, different kinds of noise and distortion are offensive to listeners in different ways. It is easy to engineer two systems with identical THD+N levels where the makeup of the harmonic distortion in one will repel listeners compared to the other. The easily observed effect that human listeners tolerate or even enjoy some even order harmonics while disliking odd order harmonics.

Further, low THD+N comes at a tradeoff to every other feature in a product. It takes more engineering effort to design and will require higher cost components that consume more power and take up more space to develop a product with very low THD+N. With all these constraints many engineers want to know, not THD+N, but “How does my product sound?” 

To answer this there has been the development of algorithms that in one way or another strive to answer this question by providing a unitless score of “goodness”. Ideally, these measurements would be published as open standards backed by research studies that provide a strong statistical correlation or R value between the output of the algorithm and statistically meaningful listening panels, but that has not been the case.

Instead, all the solutions I have seen involve black box algorithms with no independently verifiable foundation. This presents several challenges to an engineer trying to characterize their device and optimize its design. But first and foremost, what do you do when your perceptual quality model reports a poor result for your device? I have yet to see a perceptual evaluation metric that explains itself in any useful way. You test your device, it gets a score of between 1 (bad) and 2 (poor) – and then what? In every case that I have seen, the design engineers then need to revert to classic, non-perceptual metrics to understand their device and its behavior.

THD+N is a simple and easy to understand metric. While we would prefer that you use an Audio Precision analyzer to measure it, you can buy competing equipment and reproduce the same measurements between two analyzers from the same or completely unrelated companies. This is not true of any of the psychoacoustic or perceptual quality metrics in circulation.

Furthermore, you can understand the THD+N metric. The single value can be broken down into the contributions from individual distortion products, power supply hum, and other sources of noise. If a device has a poor THD+N metric, it is possible to understand what is contributing and then to engineer a better design. And while I would probably not argue that there is any meaningfully audible difference between -100 and -105 dB THD+N, I do not think there is any question that the leap from -40 dB (1980’s vintage cassette tape) to -80 dB (1990’s vintage digital audio CD) is plainly audible and preferred by human listeners.

While THD+N cannot stand alone as the single audio quality metric of merit, its simplicity, universality, and reproducibility does mean that if I had to evaluate a device, I would much rather know it’s THD+N that the MOS score provided by any of the black-box algorithms available.